Coval validates Flux: no tradeoff between latency and interruption

What’s different
What Coval Benchmarks Showed
What’s next
Get started with Flux today

At Deepgram, we have spent the last two years rethinking how transcription should work for real-time voice agents. The result is Flux, the first conversational speech recognition model, purpose-built for natural turn-taking, launched yesterday at VapiCon.

What’s different

Traditional speech-to-text models were designed for captioning or meeting notes, not conversation. When used in voice agents, they force developers to stitch together transcription, voice activity detection, and turn-taking logic, which leads to awkward pauses, premature cut-offs, and clumsy handoffs.

Flux eliminates this complexity by embedding turn-taking intelligence directly into recognition. It is trained to understand when a speaker has truly finished, when it is time to respond, and how to keep dialogue flowing naturally. The result is faster handoffs in conversation, more natural AI agent interactions, and no extra detectors or manual tuning required.

What Coval Benchmarks Showed

Coval is a simulation and evaluation platform built specifically for testing AI voice agents at scale. It automates real-world scenarios, measuring key metrics like latency, interruption handling, and accuracy, so teams can understand how their systems perform under production-like conditions.

Independent benchmarking by Coval validated Flux’s performance:

50% lower latency to first token vs Nova-3
Faster, more reliable turn detection
Accuracy preserved under real-time constraints (equivalent WER to Nova-3)

Here’s a breakdown of how it leads the entire field:

Fastest time to first token Flux consistently delivers the quickest first transcription, cutting hundreds of milliseconds to multiple seconds compared to other providers. In Coval’s latency rankings, Flux holds the top spot:

Performance Delta Analysis — *Flux is consistently the fastest model to return a first token, far ahead of competitors.*

Coval’s percentile analysis makes the gap even clearer. Flux is the baseline at 0 ms, while competitors like AssemblyAI and Speechmatics trail by 0.5 to 1.5 seconds.

AssemblyAI Universal Streaming: +0.91s at P50, up to +1.26s at P75
Speechmatics Default: +1.04s at P50, up to +1.17s at P75
Speechmatics Enhanced: +1.09s at P50, up to +1.53s at P75

Performance Rankings — *Flux sets the latency baseline. Competing providers add between half a second and one and a half seconds of delay, which users notice instantly in live conversation.*

Most consistent performance Flux does not just win on averages, it shows the tightest and most predictable latency distribution. Competitors often spike unpredictably, but Flux stays stable across runs, which translates directly into smoother user experiences.

Performance Consistency — *Flux delivers steady low latency across runs, avoiding the spikes competitors exhibit.*

Lowest latency variation In Coval’s distribution plots, Flux is the only model combining the lowest median latency with the narrowest spread. Others not only run slower, they also suffer from erratic performance that makes conversations feel laggy.

Best balance of speed and accuracy In Coval’s latency vs accuracy analysis, Flux is the only model that pairs ultra-fast latency with accuracy on par with the top providers at a competitive cost structure.

Want to see Flux in action against other providers? Head to Coval's interactive playground and test it live.

Coval’s Founder and CEO, Brooke Hopkins, put it best:

“We evaluate voice agents every day at Coval, and Flux is the first model that hasn’t gone to either extreme, where it’s either caption-like transcription that needs to be stitched together with turn detection, or a full end-to-end system that gives you no controllability. It directly tackles one of the hardest problems in the voice agent space, with an approach that truly makes progress without tradeoffs and gives teams exactly what they need.”

These benchmarks underscore what we set out to achieve: Flux is not an incremental update but a new baseline for voice-first applications.

What’s next

Flux also marks the beginning of our broader Neuroplex architecture, a new way of connecting speech-to-text, LLMs, and text-to-speech with shared context signals. Just as the human brain relies on white matter to connect specialized regions, Neuroplex enables context to flow across every stage of the voice stack.

Today’s systems often flatten speech into text, losing cues like tone, empathy, and intent. Neuroplex restores those signals, paving the way for agents that adapt in more lifelike and multidimensional ways. This is the next frontier: conversations with AI that feel fully human.

Get started with Flux today

Flux is available now. For developers building conversational agents, this means a new foundation: faster, smarter, and more human.

👉 Be one of the first to try it out 🎧 Listen to the full conversation between Scott and Brooke

What’s different
What Coval Benchmarks Showed
What’s next
Get started with Flux today

What’s different

What Coval Benchmarks Showed

Independent benchmarking by Coval validated Flux’s performance:

50% lower latency to first token vs Nova-3
Faster, more reliable turn detection
Accuracy preserved under real-time constraints (equivalent WER to Nova-3)

Here’s a breakdown of how it leads the entire field:

Coval’s percentile analysis makes the gap even clearer. Flux is the baseline at 0 ms, while competitors like AssemblyAI and Speechmatics trail by 0.5 to 1.5 seconds.

AssemblyAI Universal Streaming: +0.91s at P50, up to +1.26s at P75
Speechmatics Default: +1.04s at P50, up to +1.17s at P75
Speechmatics Enhanced: +1.09s at P50, up to +1.53s at P75

Want to see Flux in action against other providers? Head to Coval's interactive playground and test it live.

Coval’s Founder and CEO, Brooke Hopkins, put it best:

“We evaluate voice agents every day at Coval, and Flux is the first model that hasn’t gone to either extreme, where it’s either caption-like transcription that needs to be stitched together with turn detection, or a full end-to-end system that gives you no controllability. It directly tackles one of the hardest problems in the voice agent space, with an approach that truly makes progress without tradeoffs and gives teams exactly what they need.”

These benchmarks underscore what we set out to achieve: Flux is not an incremental update but a new baseline for voice-first applications.

What’s next

Get started with Flux today

Flux is available now. For developers building conversational agents, this means a new foundation: faster, smarter, and more human.

👉 Be one of the first to try it out 🎧 Listen to the full conversation between Scott and Brooke

Coval validates Flux: no tradeoff between latency and interruption

Table of Contents

Table of Contents

What’s different

What Coval Benchmarks Showed

What’s next

Get started with Flux today

Table of Contents

Table of Contents

What’s different

What Coval Benchmarks Showed

What’s next

Get started with Flux today