(Note: If you’re one of those tinker first, read later sorts of folks, dive right into The Python Notebook covered in this article.)
Are you creating a podcast? Do you have multi-person Zoom calls? Or perhaps even earnings calls to get to?
Well that’s a lot of information to keep up with. And unless you have a very good notetaker on all of those calls, it becomes extremely important that you keep a record of your discussions somewhere.
That’s where speech-to-text (STT) technology comes in. But be careful. Many STT resources out there are extremely limited. And most don’t even offer speaker-labeling as a feature.
That is, a shoddy STT application will only produce a transcript that looks like this:
Instead of something that looks like this:
Well luckily, Deepgram is here to help! Not only do we offer top-notch speaker-labeling (aka “diarization”) services, but we also have a handy-dandy notebook to help you out! That way, you don’t have to worry about writing any code. You can just upload your audios into the notebook, and run the code that was already written for you.
Ready? Let’s go!
The Python notebook
All the instructions you need are inside the notebook itself: here.
However, it can be helpful to break things down piece-by-piece. So let’s do that here. The first cell you’ll run into is the “Dependencies” cell (image below). By clicking the cell’s play-button, you’ll install all the fancy-schmancy coding packages you need for the rest of the cells to run.
After all, you can’t transcribe audios with Deepgram’s AI models without first installing Deepgram itself.
Up next, we have a cell to remind you to upload the audio of your choice into the notebook. There is a menu on the left-hand side of the screen where you can upload any audio files you wish. To upload, simply click the icon of the paper with the upwards-facing arrow on it. It will take a few moments for the audio to appear, but once it does, move onto the next cell.
And now, here’s the fun part:
Here, you’ll see a bunch of variables you need to modify. Specifically, you’ll need to change the following:
dg_key should be set to your Deepgram API key
MIMETYPE should be set to the file type of the audio you uploaded—whether that’s .wav or .mp3 or some other type of audio file
DIRECTORY should be set to the folder that contains all the audios you uploaded. If you didn’t create a new folder in the previous step (that is, if you simply followed the instructions on the previous step and didn’t do any extra work), you can just leave this as it is: ’.’
If you run this cell and wait a few moments, you should see a .json file appear in the same place you uploaded your audio files. The code was written such that all the files in the directory specified by DIRECTORY will be transcribed, as long as they end in the mimetype specified by MIMETYPE.
Note that there will be a bit of a delay between when the cell finishes running and when your .json appears. This is normal. Depending on the size of your file, it may take a bit longer than anticipated, but usually it takes less than a minute to see your .json!
Those JSONs, by the way, should look something like this:
Now, the JSON contains all the information you need to create a diarized transcript. But we already went ahead and wrote the code that does that for you. It’s in the next cell, and it looks like this:
Running this cell should return a .txt file in the same folder as your audios and your JSONs. The result should look like this! (Skip to 2:45)
And that’s it! You now have access to code that will turn any audio file you wish into a speaker-labeled (read: diarized) transcript! And if you want to go the extra mile, you can totally summarize this transcript or translate it into a different language. Really, the sky’s the limit.
So go forth and make that podcast! Hop on the earnings call! Hold that Zoom webinar! If you want those recordings transcribed, labeled, and wrapped with a cute little bow, Deepgram is here to help
Keep an eye out for many more notebooks to come! And if you want to check out Deepgram without having to look at any code at all, check out our Playground. There, you can see exactly what we have to offer. Trust me, it’s quite a lot
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.