Table of Contents
Share this guide

Do you ever wish you had more time? We can’t help with time travel (yet), but converting speech to text can be an amazing way to get some minutes back. Even listening to audio at 2x speed isn’t as fast as reading text. That’s where Deepgram comes in. In this article, we’ll show you how to create your own YouTube video downloader, and then send the audio to Deepgram for transcription.

The week I joined Deepgram also happened to be the week our internal hackathon, Gramjam, was scheduled. Gramjam is a great way to get to meet people from all around the company and build something cool with my new teammates. When we sat down to think about what problems we had, one recurring issue was a lack of time. We all had a long list of media we wanted to get through and busy schedules. We came up with and talked about ideas for an app that would summarize YouTube videos.


At a high level, the app flow is:

  • User inputs YouTube video link into Vue frontend

  • Vue frontend makes a call to Python backend

  • Backend downloads YouTube audio

  • Backend sends audio to Deepgram to transcribe

  • Backend sends transcript to a summarization model

  • Backend sends summary to frontend

  • Frontend displays summary to user

When we built this, Deepgram’s summarization features hadn’t been released yet, so to create the summary, we made a call to a third-party API. If we built this today, we could remove that step entirely. All we’d have to do is add summarize=true to our Deepgram request, and we’d get both a transcript and summary back.

In this article, we’ll only look at the backend API that handles the YouTube video downloading.


The full code for our YouTube summarizer API is in this repo. It uses the Python programming language and a number of different libraries to accomplish our desired tasks. We'll walk through a few simplified snippets below.

To get started, install our dependencies. You can either use pip install -r requirements.txt or install the dependencies individually:

pip install pytube requests

For downloading the YouTube video and audio, we’re using the pytube library, a "lightweight, Pythonic, dependency-free, library (and command-line utility) for downloading YouTube Videos”. Though there are a number of Python libraries that accomplish the task of downloading YouTube videos, we chose this one since it’s very straightforward to use. Downloading our video’s audio track is just a few lines of code:

from pytube import YouTube
from io import BytesIO

def get_youtube_audio(url):
    yt = YouTube(url)
    # extract only the audio
    video = yt.streams.filter(only_audio=True).first()

    # download the video’s audio to a buffer
    audio = BytesIO()

    return audio

In this code snippet, we download the video from YouTube, and then extract only the audio. Since we won’t be using the video itself, we don’t download that.

Once we have the audio, we can easily send it along to Deepgram. As mentioned above, Deepgram now has a summarization feature that will also take care of generating a summary—removing the need for us to make another API call.

import os
import requests


def get_transcript(audio):
    headers = {
        'Authorization': f'Token {DEEPGRAM_API_KEY}',
        'content-type': 'audio/mp3'

    url = ''
    response =, headers=headers, data=audio.getvalue())

    if response.ok:
        response = response.json()
        summary = response['results']['channels'][0]['alternatives'][0]['summaries'][0]['summary']
        transcript = response['results']['channels'][0]['alternatives'][0]['transcript']
        return [summary, transcript]
        print(f"ERROR: {response.status_code} {response.text}")

Next, we’ll invoke the functions we created to tie it all together. To demonstrate, we’ll use a video about the biology of cannibal toads. (This video was created by the partner of one of DG’s software engineers—check her channel out!)

audio = get_youtube_audio("")
[summary, transcript] = get_transcript(audio)

Then, the summary and transcript can be displayed in our front end.

Wrapping Up

In just a few lines of code, we were able to make our own YouTube video downloader and send the audio to Deepgram for transcription and summarization. The uses are endless! Speed through watching assignments for a class, catch up on the latest video thinkpieces, or just enjoy having a downloaded transcript of your favorite video.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo