Article·Dec 23, 2025

A Year of Voice AI at Scale: How Deepgram Turned AI Speech Into Infrastructure

Voice AI crossed a threshold this year—and Deepgram was at the center of it. From trillion-word scale and enterprise-grade agents to real-time speech-to-speech and global expansion, this is the story of how voice stopped being a feature and became infrastructure.

10 min read
Headshot of Jose Nicholas Francisco

By Jose Nicholas Francisco

Machine Learning Developer Advocate

Last Updated

Introduction: A Year Where Voice Became Infrastructure

Every year has its own texture. Some years feel like exploration—testing edges, learning limits, asking what’s possible. Others are about consolidation—turning ideas into systems that scale. For us at Deepgram, this year was something else entirely: a year where voice stopped being an experiment and became infrastructure.

Looking back over the past twelve months, what stands out isn’t any single launch or announcement. It’s the throughline. Again and again, we focused on the same idea: voice AI shouldn’t be stitched together, fragile, or theatrical. It should be fast, accurate, controllable, and ready for real work—across industries, languages, and geographies.

This year, we shipped relentlessly. We expanded globally. We raised the bar on accuracy, latency, and enterprise readiness. And we watched customers—from solo developers to global enterprises—build systems that simply wouldn’t have been possible a year ago.

Here’s how it unfolded, month by month.

January: Momentum, Measured in Scale

We started the year with clarity—and proof.

In January, we shared that Deepgram had entered 2025 cash-flow positive, serving more than 400 enterprise customers. Over the past four years, annual usage had grown 3.3×. Our models had processed over 50,000 years of audio and transcribed more than one trillion words.

Those numbers mattered, not as vanity metrics, but as validation. Voice AI is notoriously hard to scale: accuracy degrades, latency creeps in, costs spiral. The fact that we could grow usage at that pace—while improving performance and unit economics—confirmed something we’d believed for a long time. When voice AI is built as infrastructure, not a demo, it compounds.

January set the tone: this wasn’t a year of promises. It was a year of delivery.

February: Experimenting in Public with Vibe Coder

In February, we did something deliberately small and something deliberately big.

Here’s the big thing: We introduced Nova-3, setting a new standard for AI-driven speech-to-text across domains. If you’re familiar at all with Deepgram, then you know just how impactful this announcement was. However, if you haven’t heard of Nova-3, perhaps the rest of this recap will clue you in on how big Nova-3 became.

The small announcement was that we released Vibe Coder, an open-source VS Code extension designed to explore voice-based “vibe coding” inside AI-powered IDEs like Cursor and Windsurf. We were clear from the start: this wasn’t a fully baked product. It was an experiment.

But it mattered. Vibe Coder represented a belief that voice will increasingly live inside developer workflows—not as dictation, but as a control surface. We wanted to see how speech could shape intent, iteration, and flow. We wanted feedback. We wanted to learn alongside the community.

Whether Vibe Coder grows into something bigger or simply informs our next move, February reminded us of the value of curiosity—and of shipping early.

March: Healthcare and the Enterprise, Front and Center

March was about focus.

We introduced Nova-3 Medical, our most advanced medical speech-to-text model to date. Built specifically for clinical environments, it delivered unmatched accuracy on medication names, diagnostic terms, and procedure details—while filtering out irrelevant noise that plagues generic models. Just as importantly, it was designed with HIPAA-compliant architecture and enterprise-grade security from day one.

That same month, we announced a partnership with Genesys, launching the Deepgram Genesys Transcription Connector. Together, we enabled more accurate, real-time voice automation inside one of the world’s leading customer experience platforms.

We also published the State of Voice AI 2025, offering a data-driven look at how enterprises were actually deploying voice systems.

Healthcare and contact centers may look different on the surface, but they share the same requirement: voice AI that works under pressure. March was about meeting that standard.

April: A Breakout Month for the Platform

April was, simply put, huge.

We also crossed a major technical milestone: the development of a speech-to-speech model that operates without converting speech to text at any stage. This was a pivotal step toward fully contextual, end-to-end voice systems that preserve nuance, intonation, and emotional tone in real time.

And we introduced Aura-2, our most professional, cost-effective, and enterprise-grade text-to-speech model yet—built not for entertainment, but for real conversations.

April wasn’t about one launch. It was about the platform coming into view.

May: Voice in the Real World

In May, the story shifted from capability to impact.

We announced a partnership with Think41, a full-stack GenAI consulting firm building secure, enterprise-ready AI agents. Together, we showed what’s possible when low-latency speech recognition meets real-time agent assist: faster resolutions, better customer experiences, and systems that support humans instead of slowing them down.

It was a reminder that the value of voice AI isn’t theoretical. It shows up in conversations—live ones.

June: One API, Real Conversations

June marked a turning point.

We launched the Deepgram Voice Agent API—the industry’s only enterprise-ready, real-time, cost-effective conversational AI API. For developers, it meant one streaming API instead of stitching together STT, TTS, and orchestration layers. For enterprises, it meant control: no black boxes, no hidden constraints.

We published the Voice Agent Quality Index (VAQI), offering a new benchmark for conversational performance. Independent validation from Coval confirmed Flux’s performance: 50% lower latency to first token, faster turn detection, and accuracy on par with Nova-3.

This was the culmination of years of work. Voice agents finally felt cohesive—fast, controllable, and production-ready.

Finally, we introduced Nova-3 Medical Streaming, bringing clinical-grade accuracy to real-time transcription without sacrificing ultra-low latency.

July: Recognition and Global Reach

In July, we introduced Saga, our Voice OS for developers.

Saga lets developers control their workflows with natural speech—across tools like Cursor, MCP, and Slack—eliminating context switching and friction. It wasn’t about novelty. It was about flow.

external validation caught up with internal momentum.

Deepgram received the 2025 Voice AI Technology Excellence Award from CUSTOMER Magazine, recognizing Nova-3 for its accuracy, real-time multilingual transcription, and instant customization.

We also expanded our infrastructure globally, announcing the general availability of Deepgram Dedicated, a fully managed single-tenant runtime, alongside early access to our EU-hosted API endpoint. For European customers, this unlocked true in-region inference without compromise.

July reinforced a theme: enterprise voice AI is global—or it’s incomplete.

August: Raising the Bar Again

August was relentless.

We expanded Nova-3 with support for German, Dutch, Swedish, and Danish.

We saw Aura-2 recognized with the 2025 Contact Center Technology Award, validating its impact on both customer and employee experience.

We also leveled up the Voice Agent API with GPT-5 and GPT-OSS-20B, giving developers new choices across latency, reasoning depth, and open-source flexibility.

And we signed a strategic collaboration agreement with AWS, accelerating global deployment of voice AI across STT, TTS, and speech-to-speech.

August felt like acceleration squared.

September: Language as a First-Class Feature

In September, we expanded Nova-3 to support Spanish, French, and Portuguese

Each language expansion wasn’t just a checkbox. It represented work on accents, code-switching, morphology, and real-world audio conditions. Voice AI only works when it works everywhere.

Furthermore, we were featured on Fast Company’s Seventh Annual List of “The 100 Best Workplaces for Innovators”

October: Solving the Hardest Problem in Voice Agents

October was about mitigating interruptions—the bane of conversational systems.

First we introduced Flux, the first real-time conversational speech recognition model built specifically for voice agents. Flux solved interruptions without trading off latency.

Then, we announced our Voice Agent API’s Integration with AWS Bedrock. For enterprise users in contact centers, healthcare, and customer experience, Deepgram's Voice Agent API integrated with Bedrock unlocks ultra-accurate, real-time speech AI—backed by AWS’s security, scalability, and compliance.

Partners like Lindy, whose Gaia assistant runs on Flux, showed what natural phone conversations could finally feel like.

Then, we expanded Nova-3 again with Italian, Turkish, Norwegian, and Indonesian support, continuing our steady global march.

November: A Voice OS for Developers

In November, Nova-3 expanded with numerous new languages, broadening European and Asian coverage. Specifically, we upgraded it to support Bulgarian, Czech, Hungarian, Polish, Russian, Ukrainian, Finnish, Hindi, Japanese, Korean, and Vietnamese.

December: Closing the Loop

December brought the year full circle.

We launched Deepgram’s voice AI integrations with Amazon Connect, Amazon Lex, and Amazon SageMaker, bringing real-time speech intelligence directly into platforms enterprises already trust. Our EU Endpoint became generally available.

Aura-2 learned to speak Dutch, French, German, Italian, and Japanese. And Nova-3 got upgraded yet again with keyterm prompting and ten new languages: Greek, Romanian, Slovak, Catalan, Lithuanian, Latvian, Estonian, Flemish, Swiss German, and Malay.

It was a fitting close: deeper integrations, broader reach, and voice AI that’s ready for wherever the conversation happens next.

Looking Ahead

If this year proved anything, it’s that voice is no longer the interface of the future. It’s the infrastructure of the present.

And we’re just getting started.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.