Article·Tutorials·Jun 11, 2025
10 min read

How to Build a Virtual Medical Scribe in Python Using Deepgram and OpenAI

Learn how to accurately transcribe doctor-patient conversations using a specialized Speech-to-Text model (Nova-3 Medical). Then, discover how to use powerful LLMs (OpenAI’s GPT-4.1) to transform those transcripts into fully structured clinical notes.
10 min read
By Eteimorde Youdiowei
Updated
Published

⏩ TL;DR

  • Clinical note-taking is a critical part of every doctor-patient interaction.

  • However, it divides the doctor's attention between documenting and actually caring for the patient.

  • This challenge can be solved by using artificial intelligence (AI).

  • We can accurately transcribe doctor-patient conversations using a specialized Speech-to-Text (STT) model like Nova-3 Medical.

  • Then, powerful large language models like OpenAI’s GPT-4.1 can transform those transcripts into fully structured clinical notes.

  • This AI-driven workflow streamlines documentation and allows doctors to focus entirely on their patients.

  • Want to skip the tutorial and head straight into the code? Check out this Github repository.


To ensure accurate and quality healthcare, doctors often take a medical scribe along to document their patient encounters.

In this article, we’re going to build an AI-powered medical scribe that doctors could use to make their care more efficient. Along the way, you’ll learn:

  • How to use Deepgram’s Nova-3 Medical model for medical transcription

  • How to build a scribe system using prerecorded audio

  • How to work with real-time streaming audio

  • How to generate clinical notes using OpenAI’s language models

Let’s dive in. 🚀

(Or, if you want to skip the tutorial and head straight into the code? Check out this Github repository.)

What It Takes to Build a Medical AI Scribe

Much of software development is about replicating existing real-world processes. So, before we can build a Medical AI Scribe, we first need to understand the role of a human medical scribe.

Medical Scribes Improve Doctor-Patient Interaction


A medical scribe assists a doctor by taking notes during patient encounters. While the doctor speaks with the patient, the scribe documents important details, typically entering them into an Electronic Health Record (EHR) system using standardized formats, such as SOAP.

So, how do we replicate this with software?

First, we need a way to listen to and transcribe the conversation between the doctor and patient. That's where a Speech-to-Text (STT) model comes into play. Then, to structure the transcription into proper clinical documentation for an EHR, we need a large language model (LLM).

🏥 Medical Transcription Model

Since transcription is our first step, we need an STT model that can handle the complexity of medical conversations. We might consider using any standard STT model, such as Deepgram Nova-3, which is known for its speed and accuracy. 

However, general-purpose STT models often struggle with medical terminology, which can lead to misinterpretations or transcription errors.

That is why we turn to specialized medical transcription models. These models are trained specifically on medical conversations and vocabulary, making them much more reliable in clinical settings.

If regular Nova-3 isn't cutting it, you can try Nova-3 Medical. It builds on the strengths of the original Nova-3, like low latency and high speed, while offering even greater accuracy for medical transcription.

In this article, we will use Nova-3 Medical to power our AI scribe.

📝 Large Language Model for Note Generation

Once we have a clean transcription, the next step is turning it into structured clinical notes. This is where the LLM comes in. The good news is that most modern LLMs are flexible enough to handle this task with the right prompting.

In this article, we will use OpenAI’s models to generate clinical notes from transcribed text, producing outputs that are structured, readable, and ready for EHR documentation.

🎙️ Handling Audio Input: Prerecorded vs. Streaming

When it comes to capturing the conversation between the doctor and the patient, we have two main options: prerecorded audio or real-time streaming.

  • The first is the prerecorded approach, where the doctor can simply record the conversation during the patient visit. After the session, the recorded audio is uploaded to the Medical AI Scribe, which handles transcription and generates the clinical notes.

  • The second option is real-time streaming. In this setup, the transcription model streams the audio live as the doctor speaks with the patient. This technique allows the conversation to be transcribed in real time. By the end of the session, the transcription and the clinical documentation are already complete and ready for the EHR.

Deepgram’s Nova-3 Medical model supports both prerecorded and real-time streaming audio. 

This article will explore both methods and show you how to implement each.

(🧑‍💻 Find the complete code in this repository.)

Stage 1: Build the Prerecorded AI Scribe

Now that we’ve covered the fundamentals, let’s start building our AI scribe. We’ll first focus on creating the prerecorded AI Scribe and later move on to the real-time AI Scribe.

To build the prerecorded AI Scribe, we’ll need the following dependencies:

  • deepgram-sdk: This provides access to the DeepGram API, allowing us to use the Nova-3-Medical model for transcription.

  • openai: This library provides access to OpenAI’s models, which we can use to generate clinical notes.

  • python-dotenv: This helps manage our environment variables securely.

First, install the necessary dependencies:

Next, grab your API keys from Deepgram and OpenAI and %%writefile (IPython/Jupyter only) them in a .env file:

Now, let's jump into the code:

Explanation of the code:

We’ve made the necessary imports from DeepGram and OpenAI. Two key functions have yet to be implemented: transcribe_audio and generate_note_and_save. The transcribe_audio function will contain the transcription logic, while generate_note_and_save will handle the generation of clinical notes using OpenAI.

(🧑‍💻 Find the complete code in this repository.)

Step 1: Implement the Transcription Logic

Let’s implement the transcribe_audio function. This function will interact with the DeepGram API via the DeepgramClient to transcribe the audio file. 

The PrerecordedOptions will allow us to configure transcription settings, such as which model to use and whether to diarize speakers.

Here’s the implementation of transcribe_audio:

Key details about the transcription logic:

  1. Deepgram Client: We initialize the Deepgram client using the API key to interact with the DeepGram service.

  2. Audio File: The audio file is read in binary mode and passed as a payload for transcription.

  3. Options: We specify the use of the Nova-3-Medical model, enable smart formatting to add punctuation, and diarize to identify different speakers (e.g., patient and clinician).

  4. Transcription: The transcribe_file method sends the audio data to DeepGram, and the resulting transcription is extracted and returned.

Step 2: Implement the Clinical Note Generation Logic

Next, let’s implement the generate_note_and_save function. This function takes the transcript generated in the previous step and uses OpenAI's ResponseAPI to generate a clinical SOAP note.

Here’s how we implement it:

SOAP instruction for clinical note generation:

The SOAP_INSTRUCTION string provides a template that the model follows to generate the clinical note. It specifies how to structure the note into Subjective, Objective, Assessment, and Plan sections.

Step 3: Put Everything Together

To see the final implementation, which integrates both transcription and clinical note generation, check out the main.py file in this Github repository.

Step 4: Run and Test the Pre-Recorded AI Scribe

Once the script is ready, you can run it by passing the prerecorded audio as a command argument:

This should save the generated clinical notes in a file called generated_soap_note.txt.

To evaluate our pre-recorded scribe, we'll use a real-world video from YouTube:

We'll extract the audio from this video using pytubefix:

Once the audio is downloaded, rename the file to sample_audio.m4a for easier reference.

Now, we can run our pre-recorded AI scribe to generate the clinical note:

This will produce a detailed SOAP note. Just as we prompted, the system took the full conversation transcript and used the OpenAI model to generate a detailed SOAP note, ready to be placed into an EHR.

Now that we’ve built our prerecorded AI scribe, let’s take things a step further by creating a Real-Time AI Scribe.

(🧑‍💻 Find the complete code in this repository.)

Stage 2: Build the Real-Time AI Scribe

The real-time version will listen through your microphone, transcribe the conversation in real-time, and generate a clinical note when the session ends.

Unlike before, we won’t be using the Deepgram SDK. Instead, we'll connect directly to Deepgram’s live audio WebSocket endpoint, which accepts streaming audio data and returns real-time transcription results.

We'll use the websockets library to interact with the WebSocket endpoint and PyAudio to capture audio from the user's microphone.

Note: On some systems, installing pyaudio might require extra setup. If you run into issues, you may need to install portaudio separately.

Once installed, still make sure you have your Deepgram and OpenAI API keys saved inside your .env file.

Step 1: Set Up the Real-Time Scribe

Here’s the starting code layout we have for the real-time scribe:

Like before, we have a few functions that are not implemented or partially implemented, which we will need to complete step by step. 

Let’s walk through them.

Step 2: Implement the Microphone Stream Capture

First, let’s implement mic_callback and microphone. These two functions are responsible for:

  • Capturing audio in real-time from your microphone

  • Feeding that audio into an async queue for streaming

When the microphone is active, the script captures audio chunks and streams them into the queue asynchronously.

Step 3: Implement the Audio Sender and Receiver

Now let’s move on to sender and receiver.

  • sender sends captured audio chunks to Deepgram over WebSocket.

  • receiver listens for transcription responses and stores them.

Step 4: Parse the Speaker Transcripts

Deepgram provides speaker diarization, meaning it can label who said what. We need to parse that properly:

This function organizes words by speaker and punctuates them nicely.

Step 5: Implement the Deepgram WebSocket Connection

Next, we need to open a WebSocket connection to Deepgram by providing our Deepgram API key as the authorization token:

If you stop the microphone (Ctrl+C), it automatically generates a SOAP note based on everything it has transcribed so far.

Step 6: Generate and Save the SOAP Note

Finally, once we have all the transcripts, we use OpenAI to generate the clinical note:

The code is the same as the code from the pre-recorded example.

With that, we have a scribe who can transcribe clinical encounters in real-time. To give it a shot, save it in a file and run it like you would run any other Python program.

Find the full code in the main.py file in this Github repository.

Step 8: Test the Real-Time AI Scribe

Next, let’s generate a SOAP note for a new video, but this time we'll use the real-time AI scribe instead of the pre-recorded one.

To run the real-time demo, first start the real-time scribe script by running python main.py

Then play the video. The script will listen to your microphone, capture the audio in real-time, and send it to Deepgram for live transcription. 

After the video finishes, press Ctrl+C once. The script will then stop streaming audio, compile the transcript, and generate the clinical SOAP note.

And there we have it! We’ve tested both AI scribe scripts and proven they can be applied in real clinical situations. 

(🧑‍💻 Find the complete code in this repository.)

Conclusion: How to Build a Virtual Medical Scribe Using Deepgram and OpenAI

Using Virtual Medical AI Scribe, for the first time, physicians can give their undivided attention to patients, knowing that every critical detail of the encounter is captured with precision and care. 

The AI scribes we built showcase the powerful synergy between medical speech-to-text (STT) models like Nova-3 Medical and the reasoning abilities of sophisticated LLMs.

They change raw conversations into structured, clinical-grade documentation reliably and effortlessly.

The improvement is more than just an efficiency boost. It is a glimpse into a future where technology becomes an invisible partner in medicine, helping doctors to heal, connect, and care at an even higher level.

Resources for Further Building

Here are several helpful references and resources for learning more about building with Deepgram:

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.