Article·AI Engineering & Research·Oct 4, 2025

Coval validates Flux: no tradeoff between latency and interruption

Independent benchmarks from Coval confirm Flux sets a new baseline for conversational AI. With 50% lower latency to first token, faster turn detection, and accuracy on par with Nova-3, Flux ends the trade off between latency and interruption

4 min read
Coval validates Flux: no tradeoff between latency and interruption
Headshot of Hasan Jilani

By Hasan Jilani

Director of Product Marketing

Last Updated

At Deepgram, we have spent the last two years rethinking how transcription should work for real-time voice agents. The result is Flux, the first conversational speech recognition model, purpose-built for natural turn-taking, launched yesterday at VapiCon.

What’s different

Traditional speech-to-text models were designed for captioning or meeting notes, not conversation. When used in voice agents, they force developers to stitch together transcription, voice activity detection, and turn-taking logic, which leads to awkward pauses, premature cut-offs, and clumsy handoffs.

Flux eliminates this complexity by embedding turn-taking intelligence directly into recognition. It is trained to understand when a speaker has truly finished, when it is time to respond, and how to keep dialogue flowing naturally. The result is faster handoffs in conversation, more natural AI agent interactions, and no extra detectors or manual tuning required.

What Coval Benchmarks Showed

Coval is a simulation and evaluation platform built specifically for testing AI voice agents at scale. It automates real-world scenarios, measuring key metrics like latency, interruption handling, and accuracy, so teams can understand how their systems perform under production-like conditions.

Independent benchmarking by Coval validated Flux’s performance:

  • 50% lower latency to first token vs Nova-3
  • Faster, more reliable turn detection
  • Accuracy preserved under real-time constraints (equivalent WER to Nova-3)

Here’s a breakdown of how it leads the entire field:

Fastest time to first token
Flux consistently delivers the quickest first transcription, cutting hundreds of milliseconds to multiple seconds compared to other providers. In Coval’s latency rankings, Flux holds the top spot:

Flux is consistently the fastest model to return a first token, far ahead of competitors.

Coval’s percentile analysis makes the gap even clearer. Flux is the baseline at 0 ms, while competitors like AssemblyAI and Speechmatics trail by 0.5 to 1.5 seconds.

  • AssemblyAI Universal Streaming: +0.91s at P50, up to +1.26s at P75
  • Speechmatics Default: +1.04s at P50, up to +1.17s at P75
  • Speechmatics Enhanced: +1.09s at P50, up to +1.53s at P75

Flux sets the latency baseline. Competing providers add between half a second and one and a half seconds of delay, which users notice instantly in live conversation.

Most consistent performance
Flux does not just win on averages, it shows the tightest and most predictable latency distribution. Competitors often spike unpredictably, but Flux stays stable across runs, which translates directly into smoother user experiences.

Flux delivers steady low latency across runs, avoiding the spikes competitors exhibit.

Lowest latency variation
In Coval’s distribution plots, Flux is the only model combining the lowest median latency with the narrowest spread. Others not only run slower, they also suffer from erratic performance that makes conversations feel laggy.

Flux has the lowest median latency and the tightest distribution, meaning it is both fast and predictable.

Best balance of speed and accuracy
In Coval’s latency vs accuracy analysis, Flux is the only model that pairs ultra-fast latency with accuracy on par with the top providers at a competitive cost structure.

Flux combines the fastest latency with accuracy on par with the best providers at competitive cost.

Want to see Flux in action against other providers? Head to Coval's interactive playground and test it live.

Coval’s Founder and CEO, Brooke Hopkins, put it best:

“We evaluate voice agents every day at Coval, and Flux is the first model that hasn’t gone to either extreme, where it’s either caption-like transcription that needs to be stitched together with turn detection, or a full end-to-end system that gives you no controllability. It directly tackles one of the hardest problems in the voice agent space, with an approach that truly makes progress without tradeoffs and gives teams exactly what they need.”

These benchmarks underscore what we set out to achieve: Flux is not an incremental update but a new baseline for voice-first applications.

What’s next

Flux also marks the beginning of our broader Neuroplex architecture, a new way of connecting speech-to-text, LLMs, and text-to-speech with shared context signals. Just as the human brain relies on white matter to connect specialized regions, Neuroplex enables context to flow across every stage of the voice stack.

Today’s systems often flatten speech into text, losing cues like tone, empathy, and intent. Neuroplex restores those signals, paving the way for agents that adapt in more lifelike and multidimensional ways. This is the next frontier: conversations with AI that feel fully human.

Get started with Flux today

Flux is available now. For developers building conversational agents, this means a new foundation: faster, smarter, and more human.

👉 Be one of the first to try it out
🎧 Listen to the full conversation between Scott and Brooke

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.