Table of Contents
Accurate speaker attribution is essential for turning conversations into usable data. Whether you're analyzing contact center calls, generating clinical documentation, or building voice AI applications, understanding who said what is just as important as understanding what was said.
Today, we're introducing Batch Diarization V2, a major upgrade to speaker labeling for pre-recorded audio. Diarization V2 improves speaker attribution accuracy, reduces the speaker labeling errors that make transcripts difficult to use, and was preferred 3.3X more often in side-by-side human evaluation.
TL;DR
- New batch diarization model available today
- Preferred 3.3X in human evaluation
- Improved speaker attribution accuracy across Voice Agent, Contact Center, and Medical audio and more
- Available via the new
diarize_modelparameter - No breaking changes and no price changes
Why Speaker Attribution Matters
Diarization determines who spoke when in a conversation. When speaker labels are incorrect, downstream workflows break down.
In contact centers, inaccurate speaker attribution can impact QA review, coaching, and analytics. In healthcare, it can result in clinician comments being attributed to patients (or vice versa), creating errors in medical documentation.
In each case, the value of the transcript depends on accurately identifying the speaker behind every utterance.
Diarization V2 was built to improve speaker attribution across real-world audio and reduce the types of errors that matter most to customers.
What's New in Diarization V2
Diarization V2 introduces a new architecture that includes:
- Expanded training data
- A new speaker embedding model
- Improved segmentation and clustering
The result is more accurate speaker attribution, better turn boundaries, and fewer cases where speakers are incorrectly merged or split.
Performance Improvements Across Real-World Audio
We evaluated Diarization V2 against our existing V1 diarization model across multiple production-oriented workloads. Below are representative results from three common use cases: Voice Agent, Contact Center, and Medical audio.
We measure performance using Confusion Error Rate (CER), which represents the percentage of speech time attributed to the wrong speaker. Lower CER indicates more accurate speaker labeling.
Voice Agent
Contact Center
Medical
Across all three use cases, Diarization V2 consistently reduces speaker attribution errors compared to V1, with particularly strong improvements on the most challenging audio. Similar improvements were observed across broader evaluations.
Human Evaluators Preferred Diarization V2
Benchmark metrics are important, but we also wanted to understand how the outputs were perceived in human testing.
We compared V1 and V2 outputs side by side and asked evaluators which they preferred.
Across 158 human evaluation votes:
- 63.3% preferred V2
- 19.0% preferred V1
- 17.7% reported no preference
Overall, evaluators preferred Diarization V2 more than 3.3X as often as V1.
How to Use Diarization V2
Diarization V2 is available through the new diarize_model parameter.
The new parameter gives customers explicit control over which diarization model version they use, while allowing existing integrations to continue using V1 unchanged.
POST /v1/listen?diarize_model=latest
Use latest to automatically receive the newest generally available diarization model. You can also explicitly select a version:
POST /v1/listen?diarize_model=v2
POST /v1/listen?diarize_model=v1
Version Options
latest— Always uses the newest GA diarization modelv2— Diarization V2v1— Existing diarization model
Existing customers using diarize=true will continue using V1 with no behavior changes.
To enable V2, update your requests to use diarize_model=latest or diarize_model=v2.
Availability and Compatibility
Diarization V2 is available today across Deepgram's batch Speech-to-Text offerings, including:
- Nova-1
- Nova-2
- Nova-3
- Base and Enhanced models
- Supports all languages including multilingual audio
Supported in:
- Self-hosted deployments
- Deepgram SDKs
- US and EU deployments
All existing batch features continue to work unchanged, including smart formatting, redaction, word-level timestamps, keyterm prompting, language detection, and Audio Intelligence.
There are no pricing changes.
Get Started Today
Have feedback or questions? Reach us in GitHub discussions or contact our team.









