Article·Announcements·Oct 3, 2024
3 min read

Now Available: Deepgram Aura’s Websocket Interface for Faster Text to Speech Input Streaming

3 min read
Josh Fox
By Josh Fox
PublishedOct 3, 2024
UpdatedOct 3, 2024

tl;dr:

  • Deepgram Aura now offers a WebSocket interface to support fast input streaming.

  • Optimized for AI voice agents and real-time conversational AI applications, Aura WebSocket TTS is 3 times faster at generating speech than ElevenLabs Turbo 2.5.

  • Be among the first to build it into your product today! Sign up now to get started and receive $200 in credits absolutely free!

Announcing Deepgram Aura's WebSocket Text to Speech: Optimized for Real-Time Conversational AI  

At Deepgram, we’ve heard feedback from our users developing conversational AI agents about the challenges of building an end-to-end solution with discrete modules for streaming speech-to-text (STT) input, large language model (LLM) processing, and text-to-speech (TTS) output. These voice-powered AI agents are used in customer service, sales calls, appointment booking, food ordering, and more through the phone, web, and other devices, and the pain points surfaced by our customers inspired us to develop our newly released WebSocket interface for the Deepgram Aura Text-to-Speech API that automatically solves for: 

  • Minimizing latency: Waiting for the entire sentence to be generated before sending it to a batch API leads to frustrating delays. With our WebSocket TTS, you can send tokens from the LLM to the TTS as soon as they’re generated, reducing latency and creating a smoother conversation experience.

  • Taking inputs from any LLM: Forget about building additional logic to manage sentence chunking. Our WebSocket TTS eliminates this step, simplifying your workflow and enhancing efficiency. Our websocket takes any partial text or token. 

  • Seamlessly handling interruptions: With real-time interruption handling, you can stop the TTS as soon as a human interrupts. This ensures your conversational bot can immediately process new input and generate a relevant response without missing a beat.

  • Scaling simultaneous conversations: Handling multiple conversations simultaneously? Our WebSocket TTS supports 40+ concurrent websocket connections, meaning you can scale without worrying about hitting concurrency limits for individual TTS requests.

Our new Websocket API has the following advantages compared to your custom solution: 

  • Speed: On average, save 70% in LLM to TTS latency with token-by-token transmission, ensuring your conversational agents are more responsive than ever.

  • Naturalness: Enjoy consistent, low-latency, and natural-sounding voice outputs without the hassle of managing tokens.

  • Flexibility: Whether it’s handling interruptions or scaling conversations, our WebSocket TTS adapts to your needs, supporting all voices in Aura.

  • Simplicity: Easy integration with a straightforward setup process.

Our WebSocket interface saves users 70% or more time by allowing immediate token transmission from the LLM, compared to using our REST API without sentence chunking for conversations averaging 50-150 characters. Most of the time saved comes from eliminating the need to wait for the LLM to fully generate text before sending it to the TTS API. The longer the text, the greater the time savings. Fig. 1 plots benchmark results of the time savings for different conversation lengths when using our new websocket interface with GPT-4o compared to our original REST API without sentence chunking: 

We also benchmarked the performance of Aura’s WebSocket interface against ElevenLabs' (Turbo v2.5) websocket using GPT-4o to provide the text input. As seen in Fig. 2, Deepgram’s WebSocket interface provides spoken output 3 times faster than ElevenLabs Turbo.

Getting started


Ready to build with Deepgram Aura’s WebSocket TTS? Dive into our Getting Started Guide and see how easy it is to revolutionize your conversational AI.

Plus, check out the documentation and sample code of our Twilio Example with STT + TTS Streaming WS to create your own end-to-end conversational demo using Deepgram and Twilio.

If you have any feedback about this post, or anything else regarding Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions, Discord community, or contact us to talk to one of our product experts for more information today.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.