Introducing Batch Diarization V2

Listen to article03:36

TL;DR
Why Speaker Attribution Matters
What's New in Diarization V2
Performance Improvements Across Real-World Audio
Voice Agent
Contact Center
Medical
Human Evaluators Preferred Diarization V2
How to Use Diarization V2
Version Options
Availability and Compatibility
Get Started Today

Listen to article03:36

Accurate speaker attribution is essential for turning conversations into usable data. Whether you're analyzing contact center calls, generating clinical documentation, or building voice AI applications, understanding who said what is just as important as understanding what was said.

Today, we're introducing Batch Diarization V2, a major upgrade to speaker labeling for pre-recorded audio. Diarization V2 improves speaker attribution accuracy, reduces the speaker labeling errors that make transcripts difficult to use, and was preferred 3.3X more often in side-by-side human evaluation.

TL;DR

New batch diarization model available today
Preferred 3.3X in human evaluation
Improved speaker attribution accuracy across Voice Agent, Contact Center, and Medical audio and more
Available via the new diarize_model parameter
No breaking changes and no price changes

Why Speaker Attribution Matters

Diarization determines who spoke when in a conversation. When speaker labels are incorrect, downstream workflows break down.

In contact centers, inaccurate speaker attribution can impact QA review, coaching, and analytics. In healthcare, it can result in clinician comments being attributed to patients (or vice versa), creating errors in medical documentation.

In each case, the value of the transcript depends on accurately identifying the speaker behind every utterance.

Diarization V2 was built to improve speaker attribution across real-world audio and reduce the types of errors that matter most to customers.

What's New in Diarization V2

Diarization V2 introduces a new architecture that includes:

Expanded training data
A new speaker embedding model
Improved segmentation and clustering

The result is more accurate speaker attribution, better turn boundaries, and fewer cases where speakers are incorrectly merged or split.

Performance Improvements Across Real-World Audio

We evaluated Diarization V2 against our existing V1 diarization model across multiple production-oriented workloads. Below are representative results from three common use cases: Voice Agent, Contact Center, and Medical audio.

We measure performance using Confusion Error Rate (CER), which represents the percentage of speech time attributed to the wrong speaker. Lower CER indicates more accurate speaker labeling.

Voice Agent

Contact Center

Medical

Across all three use cases, Diarization V2 consistently reduces speaker attribution errors compared to V1, with particularly strong improvements on the most challenging audio. Similar improvements were observed across broader evaluations.

Human Evaluators Preferred Diarization V2

Benchmark metrics are important, but we also wanted to understand how the outputs were perceived in human testing.

We compared V1 and V2 outputs side by side and asked evaluators which they preferred.

Across 158 human evaluation votes:

63.3% preferred V2
19.0% preferred V1
17.7% reported no preference

Overall, evaluators preferred Diarization V2 more than 3.3X as often as V1.

How to Use Diarization V2

Diarization V2 is available through the new diarize_model parameter.

The new parameter gives customers explicit control over which diarization model version they use, while allowing existing integrations to continue using V1 unchanged.

POST /v1/listen?diarize_model=latest

Use latest to automatically receive the newest generally available diarization model. You can also explicitly select a version:

POST /v1/listen?diarize_model=v2
POST /v1/listen?diarize_model=v1

Version Options

latest — Always uses the newest GA diarization model
v2 — Diarization V2
v1 — Existing diarization model

Existing customers using diarize=true will continue using V1 with no behavior changes.

To enable V2, update your requests to use diarize_model=latest or diarize_model=v2.

Availability and Compatibility

Diarization V2 is available today across Deepgram's batch Speech-to-Text offerings, including:

Nova-1
Nova-2
Nova-3
Base and Enhanced models
Supports all languages including multilingual audio

Supported in:

Self-hosted deployments
Deepgram SDKs
US and EU deployments

All existing batch features continue to work unchanged, including smart formatting, redaction, word-level timestamps, keyterm prompting, language detection, and Audio Intelligence.

There are no pricing changes.

Get Started Today

Have feedback or questions? Reach us in GitHub discussions or contact our team.

Listen to article03:36

TL;DR
Why Speaker Attribution Matters
What's New in Diarization V2
Performance Improvements Across Real-World Audio
Voice Agent
Contact Center
Medical
Human Evaluators Preferred Diarization V2
How to Use Diarization V2
Version Options
Availability and Compatibility
Get Started Today

Listen to article03:36

TL;DR

New batch diarization model available today
Preferred 3.3X in human evaluation
Improved speaker attribution accuracy across Voice Agent, Contact Center, and Medical audio and more
Available via the new diarize_model parameter
No breaking changes and no price changes

Why Speaker Attribution Matters

Diarization determines who spoke when in a conversation. When speaker labels are incorrect, downstream workflows break down.

In each case, the value of the transcript depends on accurately identifying the speaker behind every utterance.

Diarization V2 was built to improve speaker attribution across real-world audio and reduce the types of errors that matter most to customers.

What's New in Diarization V2

Diarization V2 introduces a new architecture that includes:

Expanded training data
A new speaker embedding model
Improved segmentation and clustering

The result is more accurate speaker attribution, better turn boundaries, and fewer cases where speakers are incorrectly merged or split.

Performance Improvements Across Real-World Audio

We measure performance using Confusion Error Rate (CER), which represents the percentage of speech time attributed to the wrong speaker. Lower CER indicates more accurate speaker labeling.

Voice Agent

Contact Center

Medical

Human Evaluators Preferred Diarization V2

Benchmark metrics are important, but we also wanted to understand how the outputs were perceived in human testing.

We compared V1 and V2 outputs side by side and asked evaluators which they preferred.

Across 158 human evaluation votes:

63.3% preferred V2
19.0% preferred V1
17.7% reported no preference

Overall, evaluators preferred Diarization V2 more than 3.3X as often as V1.

How to Use Diarization V2

Diarization V2 is available through the new diarize_model parameter.

The new parameter gives customers explicit control over which diarization model version they use, while allowing existing integrations to continue using V1 unchanged.

POST /v1/listen?diarize_model=latest

Use latest to automatically receive the newest generally available diarization model. You can also explicitly select a version:

POST /v1/listen?diarize_model=v2
POST /v1/listen?diarize_model=v1

Version Options

latest — Always uses the newest GA diarization model
v2 — Diarization V2
v1 — Existing diarization model

Existing customers using diarize=true will continue using V1 with no behavior changes.

To enable V2, update your requests to use diarize_model=latest or diarize_model=v2.

Availability and Compatibility

Diarization V2 is available today across Deepgram's batch Speech-to-Text offerings, including:

Nova-1
Nova-2
Nova-3
Base and Enhanced models
Supports all languages including multilingual audio

Supported in:

Self-hosted deployments
Deepgram SDKs
US and EU deployments

All existing batch features continue to work unchanged, including smart formatting, redaction, word-level timestamps, keyterm prompting, language detection, and Audio Intelligence.

There are no pricing changes.

Get Started Today

Have feedback or questions? Reach us in GitHub discussions or contact our team.

Introducing Batch Diarization V2

Table of Contents

Table of Contents

TL;DR

Why Speaker Attribution Matters

What's New in Diarization V2

Performance Improvements Across Real-World Audio

Voice Agent

Contact Center

Medical

Human Evaluators Preferred Diarization V2

How to Use Diarization V2

Version Options

Availability and Compatibility

Get Started Today

You may also like...

Unlock voice AI at scale with an API Call

Unlock voice AI at scale with an API Call

Table of Contents

Table of Contents

TL;DR

Why Speaker Attribution Matters

What's New in Diarization V2

Performance Improvements Across Real-World Audio

Voice Agent

Contact Center

Medical

Human Evaluators Preferred Diarization V2

How to Use Diarization V2

Version Options

Availability and Compatibility

Get Started Today

You may also like...

Unlock voice AI at scale with an API Call

Unlock voice AI at scale with an API Call