Deepgram Launches Voice Agent API for Real-Time, Enterprise-Ready Conversational AI

Developer Simplicity and Faster Time to Market
Maximum Control and Flexibility
Cost-Effectiveness at Scale
Start Building with the Voice Agent API

SAN FRANCISCO, June 16, 2025 – Deepgram, the leading voice AI platform for enterprise use cases, today announced the General Availability of its Voice Agent API, a single, unified voice-to-voice interface that gives developers full control to build context-aware voice agents that power natural, responsive conversations. Combining speech-to-text, text-to-speech, and LLM orchestration and contextualized conversational logic into a unified architecture, the Voice Agent API gives developers the choice of using Deepgram’s fully integrated stack (leveraging industry-leading Nova-3 STT and Aura-2 TTS models) or bringing their own LLM and TTS models. It delivers the simplicity developers love and the controllability enterprises need to deploy real-time, intelligent voice agents at scale. Today, companies like Aircall, Jack in the Box, StreamIt, and OpenPhone are building voice agents with Deepgram to save costs, reduce wait times, and increase customer loyalty.

In today’s market, teams building voice agents are often forced to choose between two extremes: rigid, low-code platforms that lack customization, or DIY toolchains that require stitching together STT, TTS, and LLMs with significant engineering effort. Deepgram’s Voice Agent API eliminates this tradeoff by providing a unified API that simplifies development without sacrificing control. Developers can build faster with less complexity, while enterprises retain full control over orchestration, deployment, and model behavior, without compromising on performance or reliability.

“The future of customer engagement is voice-first,” said Scott Stephenson, CEO of Deepgram. “But most voice systems today are rigid, fragmented, or too slow. With our Voice Agent API, we’re giving developers a powerful yet simple interface to build conversational agents that feel natural, respond instantly, and scale across use cases without compromise.”

“We believe the future of customer communication is intelligent, seamless, and deeply human—and that’s the vision behind Aircall’s AI Voice Agent,” said Scott Chancellor, Chief Executive Officer of Aircall. “To bring it to life, we needed a partner who could match our ambition, and Deepgram delivered. Their advanced Voice Agent API enabled us to build fast without compromising accuracy or reliability. From managing mid-sentence interruptions to enabling natural, human-like conversations, their service performed with precision. Just as importantly, their collaborative approach helped us iterate quickly and push the boundaries of what voice intelligence can deliver in modern business communications.”

“We believe that integrating AI voice agents will be one of the most impactful initiatives for our business operations over the next five years, driving unparalleled efficiency and elevating the quality of our service,” said Doug Cook, CTO of Jack in the Box. “Deepgram is a leader in the industry and will be a strategic partner as we embark on this transformative journey.”

Developer Simplicity and Faster Time to Market

For teams taking the DIY route, the challenge isn’t just connecting models but also building and operating the entire runtime layer that makes real-time conversations work. Teams must manage live audio streaming, accurately detect when a user has finished speaking, coordinate model responses, handle mid-sentence interruptions, and maintain a natural conversational cadence. While some platforms offer partial orchestration features, most APIs do not provide a fully integrated runtime. As a result, developers are often left to manage streaming, session state, and coordination logic across fragmented services, which adds complexity and delays time to production.

Deepgram’s Voice Agent API removes this burden by providing a single, unified API that integrates speech-to-text, LLM reasoning, and text-to-speech with built-in support for real-time conversational dynamics. Capabilities such as barge-in handling and turn-taking prediction are model-driven and managed natively within the platform. This eliminates the need to stitch together multiple vendors or maintain custom orchestration, enabling faster prototyping, reduced complexity, and more time focused on building high-quality experiences.

In addition to the Voice Agent API, organizations seeking broader integrations can leverage Deepgram’s extensive partner ecosystem, including Kore.ai, OneReach.ai, Twilio and others, to access comprehensive conversational AI solutions and services powered by Deepgram APIs.

Maximum Control and Flexibility

While the Voice Agent API streamlines development, it also gives teams deep control over performance, behavior, and scalability in production. Built on Deepgram’s Enterprise Runtime and full model ownership across the entire voice AI stack, the platform enables model-level optimization at every layer of the interaction loop. This allows for precise tuning of latency, barge-in handling, turn-taking, and domain-specific behavior in ways not possible with disconnected components.

Key capabilities include:

Flexible Deployment: Run the complete voice stack in cloud, VPC, or on-prem environments to meet enterprise requirements for security, compliance, and performance.
Runtime-Level Orchestration: Deepgram’s runtime supports mid-session control, real-time prompt updates, model switching, and event-driven signaling to adapt agent behavior dynamically.
Bring-Your-Own Models: Teams can integrate their own LLMs or TTS systems while retaining Deepgram’s orchestration, streaming pipeline, and real-time responsiveness.

“Deepgram gives us the flexibility to bring our own models, voices, and customize behavior while controlling how we build and orchestrate our voice agents,” said Harshal Jethwa, Engineering Manager at OpenPhone. “Their system seamlessly handles the complexity of real-time voice coordination, letting us focus on creating exactly the experience we want.”

This tightly coordinated design translates directly into measurable performance gains. In recent benchmark testing using the Voice Agent Quality Index (VAQI), Deepgram achieved the highest overall score among all evaluated providers (see Figure 1). VAQI is a composite benchmark that measures the core elements of voice agent quality: latency (how quickly the agent responds), interruption rate (how often it cuts users off), and response coverage (how often it misses valid input).

Horizontal bar chart showing VAQI scores for real-time voice agent APIs. Deepgram ranks highest with a score of 71.5, followed by OpenAI at 67.2, ElevenLabs at 55.3, and Azure at 50.9.

Deepgram outperformed OpenAI by 6.4% and ElevenLabs by 29.3%, reflecting the advantage of its integrated architecture and model-driven turn-taking. The result is smooth, responsive conversations without missed inputs, premature responses, or unnatural delays.

Cost-Effectiveness at Scale

In addition to control and performance, the Voice Agent API is built for cost efficiency across large-scale deployments. When teams run entirely on Deepgram’s vertically integrated stack, pricing is fully consolidated at a flat rate of $4.50 per hour (see Figure 2). This provides predictable, all-in-one billing that simplifies planning and scales with usage. Deepgram’s vertically integrated runtime also delivers unmatched compute efficiency, optimizing every stage of the speech pipeline to minimize infrastructure costs while maintaining real-time responsiveness.

For teams that bring their own LLM or TTS models, Deepgram offers built-in rate reductions, enabling even lower total cost of ownership for production-scale deployments.

“Deepgram’s Voice Agent API stands out for its technical prowess, affordability, and flexibility, making it the smart bet for customer service voice AI,” said Bill French, Senior Solutions Engineer at StreamIt.

Bar chart comparing estimated hourly cost of real-time voice agent APIs, showing Deepgram at $4.50, ElevenLabs at $5.79, and OpenAI at $18.03

Start Building with the Voice Agent API

Experience how fast and flexible voice agents can be with Deepgram’s unified voice-to-voice API. Explore the API in our interactive playground, review documentation, or integrate in minutes using our SDK. New users receive $200 in free credits, enough to process over 40 hours of real-time voice agent usage. Start building natural, responsive conversations with infrastructure built for real-time performance and enterprise scale.

Additional Resources:

Explore the blog for an in-depth breakdown of Voice Agent API’s capabilities
Watch a fun demo of Deepgram’s voice agent API
Try Deepgram’s interactive demo
Get $200 in free credits and try Deepgram for yourself

About Deepgram

Deepgram is the leading voice AI platform for enterprise use cases, offering speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities–all powered by our enterprise-grade runtime. 200,000+ developers build with Deepgram’s voice-native foundational models – accessed through cloud APIs or as self-hosted / on-premises APIs – due to our unmatched accuracy, low latency, and pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram. To learn more, visit www.deepgram.com, read our developer docs, or follow @DeepgramAI on X and LinkedIn.

Developer Simplicity and Faster Time to Market
Maximum Control and Flexibility
Cost-Effectiveness at Scale
Start Building with the Voice Agent API