Speech to Text API for next-level apps

Convert speech to text with unmatched accuracy, ultra-low latency, and enterprise scalability. Deepgram’s speech-to-text API powers everything from transcription and analytics to real-time, human-like voice agents.

Your transcriptions will show here.

Trusted by the world’s top Enterprises and Startups

Twilio | trustbar logo
daily
Granola | trustbar logo
vapi
livekit
cloudfare
MEET FLUX

The conversational Speech to Text model

Flux is the first speech-to-text model designed for conversation, not just transcription. With built-in turn detection, ultra-low latency, and natural interruption handling, Flux enables real-time, human-like voice agents.

  • Integrated turn detection for natural flow
  • Sub-300ms end-of-turn latency
  • Conversational cues for agents to act on
  • Nova-3 level transcription accuracy
Image of the flux transcribing speech to text.

Model Overview

Deepgram models power everything from real-time conversations to domain-specific transcription, with options for speed, accuracy, and full customization.

Flux

Flux

Conversational speech recognition for real-time voice agents with built-in turn detection, natural interruption handling, and ultra-low latency.

Nova-3

Nova-3

High-performance speech-to-text for production transcription with top accuracy, multilingual support, and noise robustness.

Industry-tuned

Industry-tuned

Specialized speech-to-text models optimized for industry-specific vocabulary and structure for domains like healthcare, legal, and finance.

Custom

Custom

Custom speech-to-text models trained on proprietary or novel datasets for maximum accuracy in edge-case scenarios.

Built for the real world

Deepgram models maintain high transcription accuracy even in noisy, accented, or overlapping speech, making them ideal for real conversations.

Image showing the accuracy in speech for requests.

Speech to Text in 36+ languages

Build global applications with Deepgram’s speech-to-text API, which supports transcription in over 36 languages and dialects for real-time and recorded audio.

Image showing the languages we currently support, such as japanese, korean, english, german, among others.

Ultra-low latency for real-time apps

Deepgram delivers transcripts in under 300 milliseconds, enabling voice agents and conversational AI to respond instantly and naturally.

Image showing the the latency in the transcription.

Discover Speech to Text capabilities

Deepgram’s speech-to-text features give developers everything they need to produce accurate, readable, and secure transcripts out of the box.

icon

Keyterm prompting

Improve recognition of critical words or phrases with up to 90% higher keyword recall rate (KRR).

Learn more →

icon

Filler words

Transcribe interruptions in speech such as “uh” and “um” to capture a more natural, human-like transcript.

Learn more →

icon

Smart formatting

Enhance readability with automatic punctuation, capitalization, and paragraphing.

Learn more →

icon

Diarization

Detect speaker changes and label who said what in multi-speaker audio.

Learn more →

icon

Numerals

Turn written numbers into digits (e.g., “one hundred” → “100”) for consistency.

Learn more →

icon

Redaction

Automatically remove sensitive or personal information from transcripts.

Learn more →

Power real-world solutions with Speech to Text

Deepgram’s speech-to-text API enables accurate and scalable transcription across industries, including customer support, healthcare, media, and conversational AI.

Accurate speech-to-text for call transcription, real-time analytics, and improved customer support. Flexible deployment and custom models scale across industries.

"As we’ve begun to roll out Deepgram to our customers, we’ve noticed the platform’s distinct ability to quickly and accurately transcribe product and company names."
Adam Larsen
CTO, Creovai
Creovai | testimonial logo

Contact Center Featured Image

Healthcare-ready speech-to-text that captures medical terms and specialized keywords at scale. Ensure compliance with HIPAA and industry standards while reducing documentation time. Real-time transcription supports faster clinical workflows and improves patient care outcomes.

Up to 40x faster transcription creation

of pre-recorded audio than alternatives.

Medical Transcription

Real-time speech-to-text with ultra-low latency and turn detection for human-like voice agents. Built for understanding complex conversations.

  • High-accuracy transcription
  • Custom vocabulary injection
  • Speaker diarization
  • Topic & language detection
  • End-of-thought detection
Conversational AI

Convert audio into text to analyze conversations, detect intent, and generate actionable insights.

Being able to rely on Deepgram transcription, both on the front and back end of the call is paramount to accurate emotion detection for our Call Center Customers.
Adam Settle
VP of Product, Sharpen
Sharpen logo
Conversational AI featured Image

Fast, affordable transcription for podcasts, videos, and broadcasts with accurate captions and summaries.

  • Rich content captioning
  • SEO and audience expansion
  • Content moderation & analytics
  • Searchability & user experience
  • Streamline workflows
Media Transcription

FAQs

Trusted by startups and enterprises

Discover the power of our product through real stories.

Ready to get started?

Start building voice-first applications today with Deepgram’s speech-to-text API. It is fast, accurate, scalable, and easy to integrate.