Article·AI Engineering & Research·Jun 13, 2024

How to Transcribe a YouTube Video (Programmatically)

Learn how to use Deepgram's API to transcribe YouTube videos with only a few short lines of code! Our AI produces much more accurate results.

By Jose Nicholas FranciscoMachine Learning Developer Advocate

Last UpdatedJun 13, 2024

Are you a podcaster or a vlogger? Or perhaps you’re simply an audio nerd, like us friendly folks over here at Deepgram. Well, if you’re looking for a simple way to transcribe any YouTube video programmatically, stick around! The code below is just for you.

You can automate transcriptions that are more accurate and better formatted than the default YouTube transcriptions. You can summarize long lectures and podcasts. And you can even translate your transcripts into different languages!

This article will cover the first part: Getting beautiful, accurate, and easy-to-read transcripts with timestamps. Where you take those transcripts is up to you—You’re limited only by your own creativity!

Let’s get started!

First thing’s first, we’re going to need some way to download the audio from a YouTube video locally. Thankfully, there are libraries that already do that work for us!

And once you’ve got that, then you’re good to go! The code snippet below takes as input a list of URLs to download. The outputs are a series of .mp3 files, each with the same title as the video they came from.

Great! We have audios! The only thing left is to transcribe them. You just need a few lines of code:

Alright, let’s take this one chunk at a time:

The first two lines are imports. Nothing out of the ordinary there.
The next chunk of four lines are set-up parameters. Fill them in as follows:
- DEEPGRAM_API_KEY should be set to the API key you created upon signing up for Deepgram here.
- FILENAME should be set to the path of the file you wish to transcribe, written as a string.
- PARAMS is fine as is, but if you’d like to change the look of your output—whether that’s diarizing it, filtering profanity, or using numerals—check out the docs here!
The main() function does the following:
- Opens the audio file
- Calls the transcription API
- Outputs the transcript to a JSON file

Or, to transcribe multiple files at once, run the following block of code instead:

Here, instead of filling in a single FILENAME, you fill in the names of multiple files in the list named PREFIXES. Note that we do not include the filetype or mimetype in the prefixes list.That is, if you have an audio file named huberman_podcast.mp3, the prefix you’d enter into the PREFIXES list would be huberman_podcast.)

The async call should parallelize the transcriptions so that everything runs as efficiently as possible.

And boom! By the end of all this transcription, you should have a bunch of JSON files that look like this:

Note that the JSON above is abridged for the sake of brevity. Nevertheless, we have word-level timestamps alongside a full transcript.

And if you’d like sentence-level or paragraph-level timestamps, those are readily available as well. Just modify the parameters appropriately. Check out more details here!

And that’s it! By this point, you should have a full transcription, metadata, and word-level timestamps for each of the YouTube videos you downloaded.

So, you’ve got your videos transcribed. You’re probably wondering what to do now? Well, might I suggest:

Use our diarize tool to parse conversations. You may even analyze which Late Night TV talk show hosts allow their guests to talk the most! (By the way, the diarize tool also helps you analyze podcast conversations)
Run your transcript through Google Translate API to create a translation of your original video.
Build your own closed-captioning tool!
Or, if you end up using our live-transcription feature, you can create closed-captions live as well!

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.