If you want a video version of this blog, check this out:
First thing’s first: Grab any audio file from your computer. It doesn’t matter if it’s an .mp3 of an audiobook chapter, an .mp4 download of a podcast, or an .m4a recording of your own voice. Just grab an audio file.
That said, do be sure that the audio file has some amount of spoken words. We can’t transcribe an instrumental.
Now, let’s transcribe that audio as quickly as possible.
Open up this notebook and make a local copy of it. (Either by download or by using your own Google Colab / Jupyter Notebooks account). Note: We recommend using Google Colab for the best experience, but whatever floats your boat
You should see something like this:
If you’ve made your copy, let’s move onto the fun part
Step 1 - Installing Dependencies
The first cell of the notebook looks like this:
All this cell does is install Deepgram and a couple other audio-friendly packages into your Colab instance. After all, in order to run a Deepgram transcription model, you’ll need to install Deepgram itself
To run the cell, either click the play button on the left-hand side of the cell or click the cell itself and press Shift + Enter on your keyboard. (Or Shift + return if you’re on Mac).
Wait a few moments, and boom! You should see some colorful text like this:
The little green checkmark in the top left indicates that the cell is done running. Also note that the exact look of this screen may be different if you’re in light mode. But as long as the words “Error” or “Unable” are not a part of the message, you should be golden.
Let’s move on to the next step.
Step 2: Uploading an audio file
Remember that audio file you grabbed at the beginning of this blog post? This is where we’ll use it. On the left-hand side of the screen, you’ll see a place to upload files. In Google Colab, it looks like this:
Clicking that little folder icon should open up a directory that looks like this:
To upload your audio file, click that icon of a sheet of paper with an upwards-facing arrow on it. Your local file system should pop up. Find the audio you wish to upload, and double-click it.
For this example, I’m going to upload the first chapter of the audiobook of Emma by Jane Austen.
The upload may take a moment, depending on how large your file is. But rest assured, it’s coming.
And once it’s done, your folder should look like this:
As you can see, my audio file is an mp3, but you can upload any type you want. Deepgram’s support includes MP3, MP4, MP2, AAC, WAV, FLAC, PCM, M4A, Ogg, Opus, and WebM. So go nuts.
Alright, our file is ready. Time to transcribe.
Step 3: Transcribing
By default, the next cell should look like this:
There are just a couple of variables you’ll want to set the values of.
dg_key = Your personal Deepgram API key. With your new Deepgram account, create an API key and paste it here. (Or, to be safe, use environment variables).
MIMETYPE = the type of audio file you're working with (mp3, mp4, m4a, etc.)
DIRECTORY = The name of the folder that contains the audio(s) you wish to transcribe. Note, unless you created a new folder for your audios, the default '.' value should be fine. (Or, if you placed your audio file in the default sample_data folder, set the value of DIRECTORY to sample_data.)
Running this cell (again, by clicking the play button or by pressing Shift + Enter) should transcribe every file in the directory specified by DIRECTORY whose filename ends with the mimetype specified by MIMETYPE.
Or, if you’re following the details of this blog exactly, running this code should transcribe all the files in ’.’ (aka the current directory) that end with mp3.
Within a few seconds, you should see a .json file whose title matches the title of your original audio file. For me, it looks like this:
Note: There’s a small delay between when the cell finishes running and when the output file appears in your folder. If it doesn’t appear right away, wait a few seconds. The file should appear in less than a minute.
Let’s check out this transcription!
Step 4: Seeing your transcription!
You can see the results right away by opening the json file itself. In the case of emma, that JSON looks (partially) like this:
Pretty colors! And upon a quick glance, you may be able to tell that we supply metadata, a full transcription, word-level timestamps, and confidence measurements.
However, if you just want to see the transcription without any extra data, run the final cell of the notebook:
Just replace the OUTPUT variable with the name of the JSON that contains the transcription you want to see. In this case, I’ll be setting output equal to emma.json.
And the result looks like this:
Oh look, that’s Emma, Chapter I
While some of the transcription is cut off by the screenshot, you can simply scroll left-and-right to see the whole thing. Each line of the printed transcription from this cell is a sentence.
Aaaand that’s it! We have a fully transcribed audiobook in less than 5 minutes. Try it with your other audios, and see the results
Happy transcribing!
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.