Use Case Models for AI Speech

AI speech transcription models to fit your use case

When it comes to your customer base, a generic one-size-fits-all speech recognition model isn’t going to cut it. Deepgram provides the most accurate speech models tailored to a variety of use cases to suit your needs.

Speech Recognition Model Tiers

Base

When scalability is paramount


Base models are built on our signature end-to-end deep learning speech model architecture and offer a solid combination of accuracy and cost effectiveness.

Try it Free

Enhanced

For the best out-of-the-box accuracy


Enhanced models generally have higher accuracy with better word recognition than our Base models and they handle uncommon words significantly better.

Try it Free

Trained

For top accuracy on the words you need


Unlike custom vocabularies, custom training changes the speech model parameters so 1000s of keywords and phrases can be part of the speech model.

Contact Us

Use case models for speech recognition

Our end-to-end deep learning architecture and AutoML™ training allows Deepgram to create highly accurate, use-case specific speech recognition models. Pair the right model tier with your use case need for unparalleled accuracy on your audio.

View Documentation

General

Optimized for everyday audio processing. Generally, more accurate than any region-specific Base model for the language for which it is enabled. If you aren’t sure which model to select, start here.

 

• Base ⭐️
• Enhanced ✨
• Trained 💫

 

View in docs >

OpenAI Whisper

Deepgram and self hosted API access to OpenAI’s open source project Whisper.

 

Learn more >

Conversational AI

Optimized to allow artificial intelligence technologies, such as chatbots, to interact with people in a human-like way.

 

• Base ⭐️
• Trained 💫

 

View in docs >

Phone call

Optimized for low-bandwidth audio phone calls.

 

• Base ⭐️
• Enhanced ✨
• Trained 💫

 

View in docs >

Video

Optimized for audio sourced from videos.

 

• Base ⭐️
• Trained 💫

 

View in docs >

Meetings

Optimized for conference room settings, which include multiple speakers with a single microphone.

 

• Base ⭐️
• Enhanced ✨
• Trained 💫

 

View in docs >

Voicemail

Optimized for low-bandwidth audio clips with a single speaker. Transfer learned from our phone call model.

 

• Base ⭐️
• Trained 💫

 

View in docs >

Earnings

Optimized for multiple speakers with varying audio quality, such as might be found on a typical earnings call. Vocabulary is heavily finance oriented.

 

• Base ⭐️
• Enhanced ✨
• Trained 💫

 

View in docs >

Trained Model

Train a model on the words most important for your use case. 1000s of keywords and phrases can be part of your model.

 

White paper: How Speech Models Work >