If you want a video version of this blog, check this out:

First thing’s first: Grab any audio file from your computer. It doesn’t matter if it’s an .mp3 of an audiobook chapter, an .mp4 download of a podcast, or an .m4a recording of your own voice. Just grab an audio file.

That said, do be sure that the audio file has some amount of spoken words. We can’t transcribe an instrumental.

Now, let’s transcribe that audio as quickly as possible.

Open up this notebook and make a local copy of it. (Either by download or by using your own Google Colab / Jupyter Notebooks account). Note: We recommend using Google Colab for the best experience, but whatever floats your boat

You should see something like this:

If you’ve made your copy, let’s move onto the fun part

Step 1 - Installing Dependencies

The first cell of the notebook looks like this: 

! pip install deepgram-sdk requests ffmpeg-python

All this cell does is install Deepgram and a couple other audio-friendly packages into your Colab instance. After all, in order to run a Deepgram transcription model, you’ll need to install Deepgram itself

To run the cell, either click the play button on the left-hand side of the cell or click the cell itself and press Shift + Enter on your keyboard. (Or Shift + return if you’re on Mac).

Wait a few moments, and boom! You should see some colorful text like this:

The little green checkmark in the top left indicates that the cell is done running. Also note that the exact look of this screen may be different if you’re in light mode. But as long as the words “Error” or “Unable” are not a part of the message, you should be golden.

Let’s move on to the next step.

Step 2: Uploading an audio file

Remember that audio file you grabbed at the beginning of this blog post? This is where we’ll use it. On the left-hand side of the screen, you’ll see a place to upload files. In Google Colab, it looks like this:

Clicking that little folder icon should open up a directory that looks like this:

To upload your audio file, click that icon of a sheet of paper with an upwards-facing arrow on it. Your local file system should pop up. Find the audio you wish to upload, and double-click it.

For this example, I’m going to upload the first chapter of the audiobook of Emma by Jane Austen.

The upload may take a moment, depending on how large your file is. But rest assured, it’s coming.

And once it’s done, your folder should look like this:

As you can see, my audio file is an mp3, but you can upload any type you want. Deepgram’s support includes MP3, MP4, MP2, AAC, WAV, FLAC, PCM, M4A, Ogg, Opus, and WebM. So go nuts.

Alright, our file is ready. Time to transcribe.

Step 3: Transcribing

By default, the next cell should look like this:

from deepgram import Deepgram
import asyncio, json, os


'''
Sign up at https://console.deepgram.com/signup
to get an API key and 12,000 minutes
for free!
'''
dg_key = '🔑🔑🔑 Your key here 🔑🔑🔑'
dg = Deepgram(dg_key)




'''
The most common audio formats and encodings we support
include MP3, MP4, MP2, AAC, WAV, FLAC, PCM, M4A, Ogg, Opus, and WebM,
So feel free to adjust the `MIMETYPE` variable as needed
'''
MIMETYPE = 'mp3'


#Note: You can use '.' if your audio is in the root
DIRECTORY = 'Directory name goes here!' 




# Feel free to modify your model's parameters as you wish!
options = {
   "punctuate": True,
   "model": 'general',
   "tier": 'enhanced'
}


#This function is what calls on the model to transcribe
def main():
   audio_folder = os.listdir(DIRECTORY)
   for audio_file in audio_folder:
       if audio_file.endswith(MIMETYPE):
         with open(f"./{audio_file}", "rb") as f:
             source = {"buffer": f, "mimetype":'audio/'+MIMETYPE}
             res = dg.transcription.sync_prerecorded(source, options)
             with open(f"./{audio_file[:-4]}.json", "w") as transcript:
                 json.dump(res, transcript)
   return


main()

There are just a couple of variables you’ll want to set the values of. 

  • dg_key = Your personal Deepgram API key. With your new Deepgram account, create an API key and paste it here. (Or, to be safe, use environment variables).

  • MIMETYPE = the type of audio file you're working with (mp3, mp4, m4a, etc.)

  • DIRECTORY = The name of the folder that contains the audio(s) you wish to transcribe. Note, unless you created a new folder for your audios, the default '.' value should be fine. (Or, if you placed your audio file in the default sample_data folder, set the value of DIRECTORY to sample_data.)

Running this cell (again, by clicking the play button or by pressing Shift + Enter) should transcribe every file in the directory specified by DIRECTORY  whose filename ends with the mimetype specified by MIMETYPE.

Or, if you’re following the details of this blog exactly, running this code should transcribe all the files in ’.’ (aka the current directory) that end with mp3.

Within a few seconds, you should see a .json file whose title matches the title of your original audio file. For me, it looks like this:

Note: There’s a small delay between when the cell finishes running and when the output file appears in your folder. If it doesn’t appear right away, wait a few seconds. The file should appear in less than a minute.

Let’s check out this transcription!

Step 4: Seeing your transcription!

You can see the results right away by opening the json file itself. In the case of emma, that JSON looks (partially) like this:

Pretty colors! And upon a quick glance, you may be able to tell that we supply metadata, a full transcription, word-level timestamps, and confidence measurements.

However, if you just want to see the transcription without any extra data, run the final cell of the notebook:

# Set this variable to the path of the output file you wish to rad
OUTPUT = 'Pick your favorite output json file :)'




# The JSON is loaded with information, but if you just want to read the
# transcript, run the code below!
def print_transcript(transcription_file):
 with open(transcription_file, "r") as file:
       data = json.load(file)
       result = data['results']['channels'][0]['alternatives'][0]['transcript']
       result = result.split('.')
       for sentence in result:
         print(sentence + '.')


print_transcript(OUTPUT)

Just replace the OUTPUT variable with the name of the JSON that contains the transcription you want to see. In this case, I’ll be setting output equal to emma.json.

And the result looks like this:

Oh look, that’s Emma, Chapter I

While some of the transcription is cut off by the screenshot, you can simply scroll left-and-right to see the whole thing. Each line of the printed transcription from this cell is a sentence.

Aaaand that’s it! We have a fully transcribed audiobook in less than 5 minutes. Try it with your other audios, and see the results

Happy transcribing!

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo
Deepgram
Essential Building Blocks for Voice AI