Deepgram + AWS Voice AI Demo: Real-Time Medical Speech-to-Text in Action

In the healthcare industry, administrative burden has become one of the most pressing challenges facing clinicians today. From clinical documentation to order entry and scheduling, manual workflows slow down care, increase costs, and contribute to burnout. Voice interfaces—powered by real-time speech-to-text technology—offer a compelling way forward.
But building voice-enabled applications for clinical environments isn't trivial. Accuracy, latency, and infrastructure constraints all pose major barriers. That's where Deepgram's medical-grade speech recognition models and AWS's scalable, secure infrastructure come in. Together, they provide the foundation developers need to create healthcare-grade voice experiences.
Watch the Demo: A Prototype for Voice-First Clinical Workflows
This blog explores a demonstration created by Deepgram and AWS—not as a finished application, but as an example of what's possible when the right infrastructure is in place. It's a blueprint for developers building healthcare voice AI systems.
The video walks through a prototype application that shows what’s possible when speech-to-text technology is tightly integrated into healthcare workflows. Built with Deepgram and AWS, the prototype performs three core tasks:
Clinical note-taking
Prescription entry
Appointment scheduling
Each task is driven by real-time voice interaction, captured by Deepgram's medical transcription model and executed via function calls handled by AWS services like Amazon Bedrock.
Curious to try it out? Launch the medical assistant demo in your browser.
The Technical Foundation: Deepgram APIs + AWS Infrastructure
Deepgram's Nova-3 Medical Model
At the heart of the demo is Nova-3 Medical, Deepgram's latest speech-to-text model optimized for healthcare environments. Unlike general-purpose speech recognition, Nova-3 is trained on:
Medical terminology (including rare and Latin-derived terms)
Specialized keywords like drug names, dosages, abbreviations, and shorthand
Real-world clinical audio, including varying accents, speaking styles, and dictation formats
For developers, the key advantage is that Nova-3 requires minimal fine-tuning to deliver strong performance in healthcare-specific contexts. In real-time environments where latency and accuracy are both critical for medical transcription, this kind of domain-specific optimization makes a significant difference.
Hosted in a High-Performance AWS Stack
The application in the demo runs entirely on AWS infrastructure, designed for responsiveness, availability, and secure integration with healthcare systems:
Amazon EC2: C5a instances handle request orchestration; G5 GPU instances power model inference for Deepgram APIs
Amazon EKS: Manages containerized workloads for scalable, fault-tolerant deployment
Amazon EFS: Provides persistent storage for AI models, enabling fast access and updates
Amazon Route 53: Routes traffic to Deepgram's APIs securely and reliably
Amazon VPC: Segregates traffic and hosts services in an isolated, HIPAA-ready environment
Amazon Bedrock: Adds LLM-driven capabilities like function calling, contextual handling, and dynamic validation
These components form the foundation for building production-grade speech recognition applications where real-time processing, context management, and healthcare compliance are non-negotiable.
Use Case Walkthroughs: Modular Components for Tailored Applications
The demo presented by Deepgram and AWS doesn't prescribe a one-size-fits-all workflow. Instead, it showcases modular building blocks that developers can repurpose, extend, and integrate into their own healthcare applications. Below, we walk through the three use cases featured in the prototype—clinical note-taking, prescription entry, and appointment scheduling—highlighting their architectures, the constraints they solve, and opportunities for further development.
1. Clinical Note-Taking
In this scenario, the system guides the user through structured prompts—capturing patient name, date of birth, MRN, and other key details—using voice interaction. Clinicians respond naturally, and the system transcribes, parses, and organizes the data for downstream integration with electronic health records (EHRs).
The core components include Deepgram's streaming speech-to-text API, which handles low-latency voice transcription, and Amazon Bedrock, which powers function calling for recognizing and mapping fields. Integration with EHR systems can be achieved through FHIR APIs or HL7 via lightweight adapters.
For developers, this foundation can be extended in multiple directions. You might enhance it with ambient scribe capabilities—automatically summarizing conversations and inserting relevant information into structured fields—or enable intelligent tagging of clinical conditions using large language models. ICD-10 suggestion engines could also be layered on top, giving clinicians real-time coding support during documentation.
2. Drug Dispatching
This use case centers on capturing prescription instructions via a single, unstructured voice input. A clinician might say: "The patient is John Smith, MRN 123456, prescribe albuterol one puff as needed before exercise, send to CVS at Market Street." The system listens, transcribes, and transforms that utterance into a structured prescription.
This is where Deepgram's medical-tuned speech recognition model shines. It distinguishes between sound-alike medications—like "Klonopin" and "Clonidine"—and retains dosage fidelity even in complex phrasing. Parsing logic and validation rules can be handled via AWS Lambda or LLMs through Bedrock.
Because prescription workflows differ across organizations, this component is highly extensible. Developers could add modules to check for drug interactions, flag allergy conflicts, or automate refill scheduling based on prescription metadata. Integration with e-prescribing systems or pharmacy APIs ensures a seamless handoff from dictation to fulfillment.
3. Appointment Scheduling with Built-In Validation
In the final workflow, the system enables clinicians or staff to schedule appointments by voice. Details like appointment type, patient name, duration, and notes are captured, and built-in logic checks inputs against clinic policies. For instance, if an appointment duration is below a set threshold, the system prompts for correction before proceeding.
This use case combines real-time transcription from Deepgram with validation logic powered by custom code or LLM-based reasoning. The result is a voice-enabled front end that helps prevent common scheduling errors—like double bookings or insufficient appointment lengths—at the moment of input.
Developers might extend this further by enabling patient-driven scheduling through voice IVRs or by layering in schedule optimization features that consider provider availability, procedure type, or location. Integration points can span in-house scheduling platforms or third-party systems via REST APIs or healthcare middleware.
Case Study: When This Stack Goes Into Production
While the demo illustrates potential workflows, the underlying architecture—Deepgram's speech-to-text technology and AWS's cloud services—is already deployed in real-world settings. One example is a leading medical transcription platform that recently implemented a solution using this exact stack to power its clinical documentation engine.
The challenge was familiar: previous transcription services were not only slow and expensive, but often inaccurate—especially in recognizing sound-alike drug names and complex medical terminology. Latency was a deal-breaker for real-time workflows, and high costs limited scalability.
By adopting Deepgram's Nova-3 Medical model, hosted on a performance-optimized AWS infrastructure, the platform was able to rebuild its pipeline for speed, precision, and cost-efficiency. The platform implemented the same technical stack described in our demo, creating a streamlined pipeline for medical transcription.
The results were significant:
30% reduction in word error rate, especially on clinically important terms
Real-time processing capability, with latency low enough for live clinical workflows
90% cost reduction, with processing fees falling from 7.4¢ to under 0.5¢ per audio minute
Rapid implementation due to Nova-3's out-of-the-box accuracy with minimal tuning requirements
For engineering teams building healthcare applications, this case study demonstrates that the stack showcased in the demo isn't theoretical—it's already proven at scale. And because the underlying components are modular and extensible, teams can start with a single workflow and expand incrementally as they validate results.
A Foundation for What's Next
The Deepgram + AWS demo isn't a product—it's a starting point. It shows the art of the possible for healthcare tech teams looking to bring voice into their clinical workflows. With domain-specific speech-to-text models, real-time performance, and scalable infrastructure, the building blocks are in place.
What you build on top is up to you.
Want to see it in action? Launch the medical assistant demo in your browser.
Or jump straight into the tech: Get started with Nova-3 Medical in our API Playground.
Whether it's ambient scribes, virtual front desk assistants, or dynamic care coordination tools, this infrastructure enables a new generation of voice-first healthcare applications—faster, safer, and more human-centered.
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.