Article·Tutorials·Dec 1, 2022

Taking Notes with Voice in Python

Tonya Sims
By Tonya Sims
PublishedDec 1, 2022
UpdatedJun 13, 2024

In this blog post tutorial, we’ll learn how to take notes in Python using our voice. This means we can take an audio file and use AI speech-to-text to transcribe it. One could imagine dozens of scenarios where this could be helpful: from capturing the content of voice memos to providing a tidy written recap of a meeting to folks who couldn't attend.

Getting transcriptions out of these recordings is a pretty straightforward process. This project builds on Deepgram's speech-to-text APIs, which deliver high-quality AI-generated transcripts from both real-time streaming and batch processing pre-recorded audio sources. The project we'll do in this tutorial works with pre-recorded audio files.

Let’s walk through step-by-step taking notes with the voice in Python.

A Learn-by-Doing Speech AI Project in Python

Here’s a list of what we’ll cover in this project:

  • Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

  • Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

  • Step 3 - Setup Your Python Project 

  • Step 4 - Install Your Python Libraries and Packages using pip

  • Step 5 - How to Upload the Audio File in Python with Voice 

  • Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

  • Final Step - Run the Python Voice Note-Taking Project and Export the Results

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Deepgram has a Python SDK that we can tap into that’s located on Github. We’ll also need to get started with an API key which we can grab in Console, a game-like hub in Deepgram to try the different types of transcriptions in many coding languages, including Python. When you first sign up, you'll get $150 in API credits to try out Deepgram's speech AI capabilities.

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Our project, taking notes with voice in Python, will use the Deepgram speech-to-text transcription API and some of its more advanced capabilities to enhance our voice notes. Here are the following features we’ll use along with transcribing audio:

  • Diarization - Recognizes multiple people speaking and assigns a speaker to each word in the transcript.

  • Summarization - Summarize sections of the transcript so that you can quickly scan it.

We’ll see in a few sections how to easily implement these features in our Python project.

Step 3 - Setup Your Python Project

There are a few items we need to set up before we begin coding. I’m using Python3.10 for our project but any version equal to or higher than Python 3.7 will work. Create a folder directory anywhere on your computer, let’s call it voice-notes-with-python

Then, open that same directory in a code editor like Visual Studio.

Next, create a virtual environment. This ensures our Python libraries get installed in that project and not system wide. Make sure we’re in the correct project directory and run these quick commands from the terminal to create the Python virtual environment and activate it:

Finally, let’s create a Python file inside our directory called take_voice_notes.py.

Step 4 - Install Your Python Libraries and Packages using pip

Now we are ready to install Deepgram using pip. Make sure your virtual environment is activated and run the following command:

This allows us to use the Deepgram speech-to-text Python SDK for transcription, and tap into the features we mentioned earlier. 

To verify that Deepgram was installed correctly, from the terminal type:

We should see the latest version of Deepgram from PyPI is installed and ready for use.

Step 5 - How to Transcribe the Audio File in Python with Voice

We’ll use Deepgram’s prerecorded transcription for this taking notes with voice Python project. This type of transcription is used to transcribe an audio file, either locally on your drive or by hosting it online. In this tutorial, we’ll transcribe audio using a local but this AI speech recognition provider, it’s very simple to do both. Let’s see how we transcribe an audio file either as a local download or an online file.

Transcribe a Local Audio File with Python

Transcribe a Hosted Online Audio File with Python

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Now that we have an idea of what our Python code looks like, let’s see an example with our diarize and summarization features. In the same function as above, we can just pass in those features to a Python dictionary as keys and set the values to True, like so:

Final Step - Run the Python Voice Note-Taking Project and Export the Results

We’ve reached the final step! In this step, we need to run the Python project so we can see our JSON response with the transcript split into multiple speakers and summaries.

From our terminal type:

This runs our project and outputs a file called notes.txt, which is now in our directory. 

Open the file and we see a JSON response that looks like the following, depending on which audio file was transcribed:


We received the transcript, and each word in the transcript gets assigned a speaker and the summaries of the transcript at the end of the response. 

Conclusion of the Python Voice Note-taking Project with Speech Recognition

We’ve learned how to transcribe audio and take notes in voice with Python and an AI speech-to-text provider. 

There are many ways to extend this project by using some of Deepgram's other features like redaction which hides sensitive information like credit card numbers or social security numbers or the search feature which searches a transcript for terms and phrases. For a full list of all the features, please visit this page

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.