Article·Jan 22, 2026

Aura-2 Leads Coval’s Real-Time TTS Benchmarks

Coval recently added Aura-2 to its public text-to-speench (TTS) benchmarks. The results: Aura-2 leads on latency, consistency, and cost efficiency.

8 min read
Headshot of Hasan Jilani

By Hasan Jilani

Director of Product Marketing

Last Updated

Real-time text-to-speech is unforgiving. A 200-millisecond delay changes how users perceive an entire interaction. Independent benchmarks reveal how systems behave under pressure, beyond polished demos.

Coval recently added Aura-2 to its public TTS benchmarks. The results: Aura-2 leads on latency, consistency, and cost efficiency.

Why Coval’s benchmarks matter

Coval is a simulation and evaluation platform for voice agents, automating real-world scenarios and measuring latency, accuracy, and interruption handling across providers.

Last year, Coval benchmarked Deepgram Flux and showed it set a new baseline for conversational speech recognition, with the fastest time to first token and the tightest latency distribution. Now Coval has added Aura-2 to its TTS benchmark suite, providing independent data on both recognition and synthesis performance.

What the benchmarks show

Latency leadership

Aura-2 delivers the lowest effective end-to-end TTS latency among models tested. This held across repeated runs.

Figure 1: End-to-end TTS latency across providers. Aura-2 leads on median latency.

In Coval’s rankings, Aura-2 consistently starts playback first:

  • Shorter gaps between user speech and agent response
  • Less dead air
  • Room to overlap processing and playback without cutting off users

For contact centers handling 10,000+ daily calls, shaving 100ms per response adds up to hours of reduced wait time.

Lowest latency variation

Average latency is only part of the story. Coval highlights the full distribution, including long-tail spikes.

Aura-2 combines low median latency with the tightest spread. Long-tail spikes are minimal compared to other models.

Figure 2: Latency distribution across providers. Aura-2 shows minimal long-tail spikes.

Variability drives the awkward pauses users notice, especially under load. Aura-2’s narrow distribution means fewer surprise delays and more predictable behavior for SLAs and concurrency planning.

Accuracy under real-time constraints

Coval’s benchmarks show Aura-2 maintains accuracy suitable for customer-facing deployments while hitting real-time latency targets.

Figure 3: Accuracy under real-time constraints. Aura-2 scores well without sacrificing speed.

In Deepgram’s blinded preference tests, evaluators consistently rated Aura-2 highest for customer service scenarios when voices were evaluated under realistic real-time conditions rather than isolated clips.

Figure 4: Blinded preference results. Aura-2 preferred for clarity and naturalness in enterprise scenarios.

Cost efficiency at scale

For always-on voice agents and high-volume deployments, TTS costs compound with concurrency and uptime. Coval’s benchmarks show Aura-2 operates in one of the lowest effective cost tiers among comparable models while maintaining top-tier latency.

Figure 5: Effective cost at production scale. Aura-2 keeps cost per unit low.

Figure 6: Latency vs. price. Aura-2 sits in the lower-left quadrant: fast responses, competitive pricing.

Few models combine responsiveness, consistency, and cost efficiency in the same region of the curve.

How Aura-2 achieves these results

The engineering details are covered in a separate post from Deepgram’s CTO, Adam Sypniewski.

At launch, Aura-2 delivered sub-200ms time to first byte (TTFB). Since then, the team has:

  • Cut steady-state TTFB to around 90ms (95th percentile under 200ms)
  • Increased concurrent streams per GPU
  • Tightened latency distributions through improved scheduling and batching

The runtime is built in Rust, with engineering focused on separating prompt processing from synthesis and optimizing GPU orchestration.

Full technical breakdown: How We Took Aura-2’s TTFB from <200 ms to 90 ms.

Explore the benchmarks

Coval’s data confirms it: Aura-2 delivers the latency, consistency, and cost profile that production voice systems require.

Coval’s benchmark explorer is public. Examine Aura-2’s performance directly and compare against other models.

New to Aura-2? Try it in the Deepgram Playground, or sign up for $200 in free credits.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.