Article·Tutorials·Nov 1, 2021

Transcribe YouTube Videos with Node.js

Kevin Lewis
By Kevin Lewis
PublishedNov 1, 2021
UpdatedJun 13, 2024

In this article we will be creating transcripts for YouTube videos using Deepgram's Speech Recognition API. First, we will download videos and convert them to mp3 audio files. Then, we will use Deepgram to generate a transcript. Finally, we will store the transcript in a text file and delete the media file.

The final project code can be found here.

Watch this tutorial as a video:

We need a sample video, so I am using a Shang-Chi and The Legend of The Ten Rings teaser trailer - if that is a spoiler for you please go ahead and grab another video link.

Before We Start

You will need:

  • Node.js installed on your machine - download it here.

  • A Deepgram API Key - get one here.

  • A YouTube Video ID which is part of the URL of a video. The one we will be using is ir-mWUYH_uo.

Create a new directory and navigate to it with your terminal. Run npm init -y to create a package.json file and then install the following packages:

Create an index.js file, and open it in your code editor.

Preparing Dependencies

At the top of your file require these four packages:

fs is the built-in file system module for Node.js. It is used to read and write files which we will be doing a few times throughout this post. ffmpeg-static includes a version of ffmpeg in our node_modules directory, and requiring it returns the file path.

Initialize the Deepgram and YouTubeMp3Downloader clients:

Download Video and Convert to MP3

Under the hood, the youtube-mp3-downloader package will download the video and convert it with ffmpeg on our behalf. While it is doing this it triggers several events - we are going to use the progress event so we know how far through the download we are, and finished which indicates we can move on.

Save and run the file with node index.js and you should see the file progress in your terminal and then have the file available in your file directory.

Get Transcript from Deepgram

Where the comment is above, prepare and create a Deepgram transcription request:

There are lots of options which can make your transcript more useful including diarization which recognizes different speakers, a profanity filter which replaces profanity with nearby terms, and punctuation. We are using punctuation in this tutorial to show you how setting options works.

Rerun your code and you should see a JSON object printed in your terminal.

Saving Transcript and Deleting Media

There is a lot of data that comes back from Deepgram, but all we want is the transcript which, with the options we provided, is a single string of text. Add the following line to access just the transcript:

Now we have the string, we can create a text file with it:

Then, if desired, delete the mp3 file:

Summary

Transcribing YouTube videos has never been easier thanks to Deepgram's Speech Recognition API and the Deepgram Node SDK. You can find the final project code at https://github.com/deepgram-devs/youtube-transcripts.

Check out the other options supported by the Deepgram Node SDK and if you have any questions feel free to reach out to us on Twitter (we are @DeepgramAI).

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.