Deepgram Brings Real-Time Speech Intelligence to Amazon SageMaker

The Problem Every Developer Knows
How We Fixed It
Watch the Workflow in Practice
How the Demo Works
What This Means for You
Go Deeper: AWS Engineering Walkthrough
What Comes Next
Ready to Get Started?

Enterprise developers have been wrestling with a frustrating reality for years: you could build sophisticated ML workflows in Amazon SageMaker, but the moment you needed real-time speech processing, you hit a wall. Today, that changes.

We're launching Deepgram's STT, TTS, and Voice Agent API integration with Amazon SageMaker—the first solution to deliver streaming, real-time transcription directly through the SageMaker API. After months of engineering work and close collaboration with AWS, together we've solved one of the most persistent pain points in enterprise voice AI.

The Problem Every Developer Knows

Ask any engineer who's tried to build real-time speech applications on AWS, and they'll tell you the same story. SageMaker is brilliant for batch ML workloads, but it never supported native streaming for speech. So teams got creative—and by creative, we mean they cobbled together complex architectures with Lambda functions, Kinesis streams, and custom pipelines just to handle audio in real-time.

The result was predictable: higher latency, operational headaches, and solutions that were brittle at scale. For industries where every millisecond counts—think call centers handling thousands of concurrent conversations, or trading floors where voice commands trigger million-dollar transactions—these workarounds were deal-breakers.

How We Fixed It

Our integration does exactly what you'd hope: it makes real-time speech processing work like any other SageMaker endpoint. No custom pipelines. No Lambda gymnastics. Just clean, streaming STT, TTS, and Voice Agent API integrations that scale with your infrastructure and play nicely with your existing ML workflows.

The real-time integration with Sagemaker gives you everything you'd expect from enterprise-grade speech AI: support for HTTP/2 or WebSockets, sub-second latency, automatic scaling, and the security and compliance benefits that come with staying inside the AWS ecosystem. More importantly, it means your teams can focus on building great voice experiences instead of wrestling with infrastructure.

Watch the Workflow in Practice

To show what this unlocks in a real workflow, here is a short demo of a pharmacy voice agent built on Deepgram and running on SageMaker. In the video, the agent handles an end-to-end customer inquiry: authenticating a caller with a Member ID, pulling the correct order, identifying the medication, checking refill availability, and giving a precise pickup time. Each step is powered by real-time streaming STT, TTS, and agent logic running natively on SageMaker, so the interaction feels natural and responsive while retrieving accurate, structured data from backend systems.

How the Demo Works

The diagram illustrates the workflow behind the pharmacy demo. Audio from the user is streamed into Deepgram STT through the new SageMaker BiDirectional Streaming API for transcription, which is then passed to an LLM hosted on Amazon Bedrock along with the relevant pharmacy data. The model generates a structured text response, which is returned to Deepgram TTS through the new SageMaker BiDirectional Streaming API to synthesize natural-sounding speech. Pipecat provides the orchestration layer that manages each step of the pipeline, making it easy to coordinate audio streaming, model calls, and database lookups inside an AWS VPC. The result is a fully synchronous, low-latency voice interaction that feels like speaking with a real assistant while keeping every component inside your AWS environment.

What This Means for You

The applications are pretty much everywhere you'd expect:

Contact centers can finally do real-time sentiment analysis and live agent coaching without the infrastructure complexity. Conversational AI applications get more responsive and can handle the kind of natural, flowing conversations that users actually want to have. Analytics teams can process voice data as it comes in rather than waiting for batch jobs to complete.

And compliance teams, who often have the most stringent requirements around data handling, get all the benefits of AWS's security model without having to worry about data leaving their VPC for external speech processing.

Go Deeper: AWS Engineering Walkthrough

If you want to explore how SageMaker’s new bidirectional streaming works under the hood, the AWS team has published an in-depth engineering walkthrough. It covers the runtime architecture, WebSocket flow, container requirements, and how to deploy Deepgram models using the new streaming APIs. Read the full guide on the AWS Machine Learning Blog

What Comes Next

This launch represents months of joint engineering work with AWS, and we're not slowing down. We'll be at re:Invent demonstrating the real-time implementation of Deepgram on Sagemaker. We're also planning a series of technical deep-dives for teams who want to understand exactly how to architect these solutions.

The broader goal here isn't just solving a technical problem—though we've definitely done that. It's about removing the barriers that have kept voice AI on the sidelines for too many enterprise applications. When building real-time speech processing is as straightforward as deploying any other ML model, we think you'll be surprised by what teams start building.

Ready to Get Started?

If you've been waiting for native streaming Voice AI in SageMaker, your wait is over. The integration is live, and our team is ready to help you implement it. Whether you're processing customer calls, building voice assistants, or analyzing meeting recordings in real-time, we've built this to handle your use case.

For developers who want early access to our SageMaker SDK for Python and JavaScript, you can request entry to the beta program here.

Ready to transform your voice AI capabilities? Connect with our team to explore how Deepgram's SageMaker integration can accelerate your journey from speech data to actionable insights—instantly.

Learn more about our AWS partnership and technical implementation on our AWS partner page.

The Problem Every Developer Knows
How We Fixed It
Watch the Workflow in Practice
How the Demo Works
What This Means for You
Go Deeper: AWS Engineering Walkthrough
What Comes Next
Ready to Get Started?

The Problem Every Developer Knows

How We Fixed It

Watch the Workflow in Practice

How the Demo Works

What This Means for You

The applications are pretty much everywhere you'd expect:

Go Deeper: AWS Engineering Walkthrough

What Comes Next

Ready to Get Started?

For developers who want early access to our SageMaker SDK for Python and JavaScript, you can request entry to the beta program here.

Learn more about our AWS partnership and technical implementation on our AWS partner page.

Deepgram Brings Real-Time Speech Intelligence to Amazon SageMaker

Table of Contents

Table of Contents

The Problem Every Developer Knows

How We Fixed It

Watch the Workflow in Practice

How the Demo Works

What This Means for You

Go Deeper: AWS Engineering Walkthrough

What Comes Next

Ready to Get Started?

Table of Contents

Table of Contents

The Problem Every Developer Knows

How We Fixed It

Watch the Workflow in Practice

How the Demo Works

What This Means for You

Go Deeper: AWS Engineering Walkthrough

What Comes Next

Ready to Get Started?