tl;dr:

  • Today we’ve officially launched the latest component of our Voice AI Platform, Deepgram Aura–the first text-to-speech model built for responsive, conversational AI agents and applications.

  • Aura includes a dozen natural, human-like voices with lower latency than any comparable voice AI alternative and is already being used in production by several of our customers.

  • Experience Aura yourself with our open source tech demo or be among the first to build the Aura API into your product today! Sign up now to get started with Aura and receive $200 in credits absolutely free!

Announcing Deepgram Aura Text-to-Speech API

In the three months since our initial preview announcement for Aura–our inaugural text-to-speech model–we’ve been amazed by the progress our early adopters have made. In fact, several are already successfully running Aura in production with many more nearly ready to deploy soon. That’s why we’re so excited to officially announce the general availability of Aura, representing an important expansion of the Deepgram Voice AI platform.  

With the large language model (LLM) gold rush in full swing (and gaining additional momentum by the day), the race is on among upstart innovators and tech titans alike to redefine the world of computing and the way we’ll interact with technology itself in the future using nothing more than our voice. However, before such a world can be built, significant limitations in speed, cost, quality, and scale must first be overcome. But with Deepgram Aura, especially when paired with our industry-leading Nova-2 speech-to-text API, developers now have the tools they need to easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications. 

But don’t take our word for it. We built an early proof-of-concept prototype and open sourced it (bugs and all!). Experience Aura firsthand with this interactive demo, or try converting your own text to audio on our product page.


Try our interactive demo

Deepgram Aura TTS for real-time AI agents
Launch Demo

The Deepgram Voice AI Platform

With the addition of Aura, Deepgram now provides a comprehensive voice AI platform, offering a complete set of APIs developers need to create powerful voice AI experiences across a vast array of use cases. The platform consists of three main components corresponding to the primary phases of a conversational interaction, including:

  1. Listen: Using perceptive AI models like speech-to-text to accurately transcribe conversational audio into text.

  2. Think: Using abstractive AI models (LLMs and task-specific language models) that perform natural language understanding, information retrieval, and other intelligent tasks at the heart of an AI agent system.

  3. Speak: Using forms of generative AI models like text-to-speech and large language models (LLMs) to interact with human speakers just as they would with another person.

Fig. 1: The Deepgram Voice AI Platform

Fig. 1: The Deepgram Voice AI Platform


With today’s announcement, developers now have a one-stop shop in Deepgram for both speech recognition and voice generation APIs that enable the fastest response times and most natural-sounding conversational flow in a real-time voicebot and AI agent application.

“Along with life-like voices with latency low enough to keep up with the conversation, Deepgram’s transcription is more accurate and faster than other solutions as well. Dealing with one company for both transcription and text-to-speech is HUGE.”

Leandro Torres, Co-founder at Voxity AI


Real-time performance for real-time voice agents

As we’ve previously discussed, text-to-speech use cases can be classified into two discrete categories: high production and high throughput. High production usage primarily focuses on best-in-class voice quality for niche content production use cases like audiobook narration and monologue-driven radio and TV ads. The main opportunity is leveraging AI to perfectly emulate a human speaker and avoid the cost of a highly paid voice actor, and these use cases are primarily served with UI-centric production tools.

In contrast, high throughput use cases are API-driven and all about scale and speed–handling many short, real-time conversations between humans and AI agents. From booking an appointment with your doctor to ordering fast food, these tasks are prevalent throughout our daily lives. Voice quality is certainly important, but the key to achieving positive outcomes is the naturalness of the conversational flow. Speed and efficiency are the name of the game for transcription and voice synthesis, as every millisecond counts, especially when an LLM is used as the brains in an AI agent system.

Recent demos built using Deepgram and the Groq® LPU™ Inference Engine are a powerful proof point of the experience a lightning fast voice agent can provide. Groq offers the fastest language processing accelerator on the market, and when paired with Deepgram’s STT and TTS APIs, enables the type of realistic, naturally flowing conversation between human and AI agent that previously only existed in the realm of science fiction. And just as importantly, these capabilities are available through powerful APIs that make it easy to build for high throughput use cases as shown in the video below.


“Deepgram and Groq share the belief that speed and efficiency are the missing ingredients in unlocking natural AI for daily use by everyone, as evidenced by the recent viral reception to ultra-fast LLMs when made available for the first time. Their voice AI models are prime examples of what can be achieved with the Groq API.”

Jonathan Ross, Founder and CEO Groq


Build with Aura today

Aura is engineered to offer the optimal mix of speed, quality, and efficiency – it's the quickest among premium options and the finest among the fast ones. When integrated alongside Deepgram Nova-2 STT and your chosen LLM, it enables real-time agents that can listen, act, and communicate with natural-sounding voices, revolutionizing customer interactions.

Aura will equip AI agents with lifelike voices, and it’s been developed with capabilities that replicate authentic human dialogues. This includes prompt replies, natural cadences that include pauses, audible breaths and hesitation sounds like 'uh' and 'um,' as well as dynamic adjustments in tone and emotion to suit the conversation's context.

Customers will initially have a choice amongst 12 English-speaking voices (7 male, 5 female) with additional voices planned for future releases. All of our voices are trained on high quality conversational datasets and have average response times below 250 ms for typical dialogue sequences. Aura will follow Deepgram’s standard usage-based pricing scheme and starts at just $0.015/1K characters.

Aura is available today to anyone with a valid Deepgram API key. That means all current customers can start building with it now, and all new signups can take advantage of our free $200 in credits, good for more than 13M characters' worth of voice synthesis using any of our available voices. 

Learn more from our API Documentation or visit our product page to explore the details of Aura, our new text-to-speech API. 


Customer Spotlight: Aura in production

A number of our early build partners have already successfully integrated Deepgram Aura into their applications and we’re excited to showcase a couple examples of their early work. 

Humach

Humach {humans + machines}, a digital and human contact center solutions company established in 1988, provides innovative Customer Experience (CX) solutions to leading Fortune 500 brands. The company specializes in developing conversational AI agents for various sales and support use cases, including retail, utility, travel, and healthcare, aiming to enhance the customer experience.

Humach uses Deepgram for transcription services and is integrating Aura Text-to-Speech (TTS) into its conversational flows to help retail clients manage the ordering process. The company selected Aura for its low latency and superior voice quality after comparisons with other vendors.

"When we switched from a cloud vendor’s transcription service to Deepgram’s Nova-2, we saw a notable leap in transcription accuracy and responsiveness. Now, with Aura’s text-to-speech, we’re achieving speeds 2-5 times faster than competitors, while delivering the voice quality and latency needed for low handle times and first call resolutions. Deepgram’s robust infrastructure serves high-quality, reliable models that excel in supporting our seasonal traffic for retail, utility, travel, and healthcare use cases."

Tim Houlne, CEO at Humach

Vapi

Companies like Vapi.ai are democratizing voicebots and enabling the development of smarter and more engaging voice AI chatbots for everyone to use. Vapi offers an end-to-end platform for building voice assistants, and they are integrating Deepgram Aura because of its speed and human-like voice quality. Vapi has extensive experience working across the AI agent stack and understands how important low-latency performance is to drive user engagement and provide a compelling customer experience. 

“Deepgram has pushed the bounds of ASR for the last 8 years, defining the standard in price, quality and low-latency. They are bringing the same intensity to TTS and it's really exciting. The voices are fast, human, and well-priced. We're here for it.”


Nikhil Gupta, Co-founder at Vapi

Daily

Daily has been a long-time partner of Deepgram with integration of our speech-to-text APIs in their leading WebRTC API platform for audio and video. Daily gives developers everything they need to integrate audio and video call features into real-time applications, including AI-powered tools for conversational AI and voicebot use cases. Applications powered by Daily can be found across a number of domains including telehealth, edtech, and virtual collaboration. They recently integrated Deepgram Aura into their production toolkits for LLM-powered interactive voice use cases as showcased in the video below.


“We chose Deepgram as our original partner for speech-to-text several years ago, after rigorously evaluating all of the options. Deepgram has continued to deliver both the most accurate and lowest latency transcription tech as the audio intelligence space has evolved. We were excited to beta test the Aura voices, because text-to-speech is increasingly an important feature to our customers. Aura's very fast time-to-first-byte and natural voice quality make it a perfect fit for conversational AI agents.”

Kwindla Hultman Kramer, CEO Daily

What’s Next

At Deepgram, we are committed to releasing early and often, and we will continue to expand Aura’s capabilities throughout the year and beyond. This includes making our voices even more lifelike and conversational, adding additional voices for new use cases, expanding language support for global applications, and more. As we’ve discussed, scaled voice agents are a high throughput use case, and their success will ultimately depend on a voice AI solution that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. We’re looking forward to continuing to work with innovative companies like Humach, Vapi, and Daily across speech-to-text AND text-to-speech as they leverage the Deepgram Voice AI platform to create the AI agent future.

To learn more, please visit our API Documentation or visit our product page to explore the details of Aura, our new text-to-speech API. Sign up now to get started and receive $200 in credits (good for more than 13M characters’ worth of voice generation) absolutely free!


Deepgram Aura TTS - Now Available



If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions or contact us to talk to one of our product experts for more information today.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo