Coval validates Flux: no tradeoff between latency and interruption


At Deepgram, we have spent the last two years rethinking how transcription should work for real-time voice agents. The result is Flux, the first conversational speech recognition model, purpose-built for natural turn-taking, launched yesterday at VapiCon.
What’s different
Traditional speech-to-text models were designed for captioning or meeting notes, not conversation. When used in voice agents, they force developers to stitch together transcription, voice activity detection, and turn-taking logic, which leads to awkward pauses, premature cut-offs, and clumsy handoffs.
Flux eliminates this complexity by embedding turn-taking intelligence directly into recognition. It is trained to understand when a speaker has truly finished, when it is time to respond, and how to keep dialogue flowing naturally. The result is faster handoffs in conversation, more natural AI agent interactions, and no extra detectors or manual tuning required.
What Coval Benchmarks Showed
Coval is a simulation and evaluation platform built specifically for testing AI voice agents at scale. It automates real-world scenarios, measuring key metrics like latency, interruption handling, and accuracy, so teams can understand how their systems perform under production-like conditions.
Independent benchmarking by Coval validated Flux’s performance:
50% lower latency to first token vs Nova-3
Faster, more reliable turn detection
Accuracy preserved under real-time constraints (equivalent WER to Nova-3)
Here’s a breakdown of how it leads the entire field:
Fastest time to first token
Flux consistently delivers the quickest first transcription, cutting hundreds of milliseconds to multiple seconds compared to other providers. In Coval’s latency rankings, Flux holds the top spot:


Coval’s percentile analysis makes the gap even clearer. Flux is the baseline at 0 ms, while competitors like AssemblyAI and Speechmatics trail by 0.5 to 1.5 seconds.
AssemblyAI Universal Streaming: +0.91s at P50, up to +1.26s at P75
Speechmatics Default: +1.04s at P50, up to +1.17s at P75
Speechmatics Enhanced: +1.09s at P50, up to +1.53s at P75


Most consistent performance
Flux does not just win on averages, it shows the tightest and most predictable latency distribution. Competitors often spike unpredictably, but Flux stays stable across runs, which translates directly into smoother user experiences.


Lowest latency variation
In Coval’s distribution plots, Flux is the only model combining the lowest median latency with the narrowest spread. Others not only run slower, they also suffer from erratic performance that makes conversations feel laggy.


Best balance of speed and accuracy
In Coval’s latency vs accuracy analysis, Flux is the only model that pairs ultra-fast latency with accuracy on par with the top providers at a competitive cost structure.


Want to see Flux in action against other providers? Head to Coval's interactive playground and test it live.


Coval’s Founder and CEO, Brooke Hopkins, put it best:
These benchmarks underscore what we set out to achieve: Flux is not an incremental update but a new baseline for voice-first applications.
What’s next
Flux also marks the beginning of our broader Neuroplex architecture, a new way of connecting speech-to-text, LLMs, and text-to-speech with shared context signals. Just as the human brain relies on white matter to connect specialized regions, Neuroplex enables context to flow across every stage of the voice stack.
Today’s systems often flatten speech into text, losing cues like tone, empathy, and intent. Neuroplex restores those signals, paving the way for agents that adapt in more lifelike and multidimensional ways. This is the next frontier: conversations with AI that feel fully human.
Get started with Flux today
Flux is available now. For developers building conversational agents, this means a new foundation: faster, smarter, and more human.
👉 Be one of the first to try it out
🎧 Listen to the full conversation between Scott and Brooke
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.