Generate and download YouTube captions with Deepgram and Node.js
Hello there. My name is Kevin Lewis, and I’m a developer advocate here at Deepgram. And today, I’m gonna show you how to get a transcript from a YouTube video using Deepgram’s speech recognition API. This project has three parts. In the first part, we’ll be downloading a YouTube video to a local m p three file. In the second part, we’ll take that m p three file and send it to Deepgram to get a nice and accurate transcript. And then finally, we’ll take that transcript and store it in a text file on our computer.
I’ve just gone ahead and installed the dependencies we’re getting used today and required them in this index j s file. So we’re using the file system built in library. We have this package called YouTube m p three downloader, which Surprise Surprise. We’ll download a YouTube video. We have the Deepgram Node JS SDK. And finally, we have f f m peg static. F f m peg is an audio processing and manipulation utility and f f m peg static. makes the executable available inside of our project. So let’s crack on.
We’re ready to go. The first thing we’re gonna do is download a YouTube video. So we’ve required YouTube m p three downloader. Now we’re gonna go ahead and initialize it. and we initialize it with a few settings and link directly to our executable specifying that we want the m p three file to be put alongside the index j s file. And finally, specifying that we want to prioritize the highest audio quality possible. Then to download it, we can just go ahead and provide a YouTube video ID. This is of a movie trailer. So this is all we need to download a YouTube video.
There is something else we need to do here though. We need to know when it’s finished because we don’t wanna send it to Deepgram until we have the completed file. Fortunately, this YouTube MP three downloader package emits an event when the download has been completed. So we can listen for that event, finished. And what is returned once it is downloaded. is a object in this video parameter here, which contains some metadata about the file. We really only care about the video file name because we’ll need that later. What we’ll do here is just console log at the the video file name. has been downloaded. So we’ll run this and we’ll just check that that successfully downloads. It’s only a short video So it just took a moment there, but that is our m p three file downloaded.
Now in the next step, we wanna take this file and provide it to Deepgram. So the first thing we wanna do is take this required node j s s t k and initialize it. and we’ll initialize it with our API key, which I have stored here in an environment variable. Then we’re ready to ask for a prerecorded transcription. We need to provide the file name here, which we stored video file name. And you can provide any features here that you want. we have a whole list of features in our documentation. I’m gonna use punctuate, and I’m also going to use utterances. which will return which will return phrases as well as words. Then we’re gonna go ahead here and console log the result. And I’m just gonna use our captions generate here two web VTT to make it a little easier to see inside of the terminal.
So that is now re downloading the m p three file, going off to Deepgram, coming back, and there is our brand new transcription. So the final step is to save this text right here in our terminal to a text file. So we won’t console log it. Instead, we’re gonna go ahead and save it. All we need to do here is use the file system module. Write file sync.
The first thing we need to provide is the file name. and the file name is going to be video file name, and then we’ll just add to the end of that dot t x t. and we also want to provide the text. So we’ll just do to wear VTT. we’ll run this one final time and it’s gonna go grab a brand new m p three for us, go off to Deepgram and then provide our cap options right here in this file.
I hope you found that interesting. And if you have any questions at all, please feel free to reach out to us. We love answering questions and helping you build great projects with voice. Bye for now.