Deepgram is proud to announce the release of Aura-2, our text-to-speech model purpose-built for realtime enterprise use cases.
Performance
Sub-200ms time-to-first-byte (TTFB) latency for real-time conversational interactions
0.111x Real-Time Factor (RTF), synthesizing one second of audio in just over 100 milliseconds
Voice Quality & Features
Enterprise-optimized voice catalog with 40+ distinct voices, each designed for specific business contexts
Tuned for professional and transactional interactions with appropriate tone, pacing, and emphasis
Superior pronunciation accuracy for domain-specific content:
Currency and numerals
Dates and timestamps in varied formats
Email addresses, passwords, and URLs
Complex addresses and location references
Industry-leading voice clarity rated higher than competitors in customer service scenarios
Availability
Aura-2 is available now via REST and Websocket APIs
Currently available for use through our hosted offering
For detailed information about Aura-2, please refer to our Developer Documentation.