See how Deepgram stacks up. Check out our ASR Comparison Tool. 🍎🍊

All Posts

Which Speech Recognition Model is Best for My Business?

A funny thing happened when Deepgram first decided to use end-to-end deep learning (E2EDL) to design our next-generation speech-to-text (STT) solution. We found that this approach was hugely flexible and easier to optimize than traditional STT. We didn’t have to reconnect and optimize multiple models (acoustic, pronunciation, and language) every time we wanted to make a change. And we could retrain and enhance our speech models without starting from scratch.

With transfer learning, we could build new speech models faster. This trait of our technology has allowed us to build different base speech models for different use cases and needs. It also allows us to tailor models in cases where a customer needs something specific that we don’t currently offer.

Let’s take a look at the two types of models that we offer here at Deepgram and what each is good for.

1. Language-by-Use Case Models

All of our use case-specific models are available in various English dialects. We are expanding into different language-by-use case combinations as we continue to train and optimize our speech models for specific circumstances, such as call centers or meeting transcription, as well as expanding the spoken languages we offer. Our customers have found that combining a spoken language and use case to create a speech model that works specifically for their needs is more accurate than Big Tech’s out-of-the-box, one-size-fits-none models.

These targeted models have the fastest speed and are optimized for the best scalability. Our models can transcribe one hour of pre-recorded audio in 30 seconds. These models are great for all applications, especially ones that need very high speeds or cost savings for on-prem use. You also don’t need to trade off speed or scalability for high accuracy and because we have multiple models for different use cases—unlike Big Tech—our models tend to be more accurate as well.

How to Evaluate a Deep Learning ASR Platform

How to Evaluate a Deep Learning ASR Platform

Get the information you need about 1st generation, 2nd generation, and modern-day automatic speech recognition (ASR) solutions to ensure your evaluation experience is efficient and yields the data you need to make your purchasing decision.

Download Now

2. Higher Accuracy Enhanced Models

We also built our next-generation architecture with the highest English language accuracy on long-tail words or words that are not as common in regular conversations. This new architecture was rebuilt from our current architecture to optimize accuracy on more words. 

This new enhanced speech model architecture is best suited where you have keywords and terms that you must get correct but are not in normal conversations; like fiduciary, biodiversity, formulae. Some use cases can be Conversational AI for B2B, technical support contact centers, or technical meetings or seminars.

3. Models Tailored for Your Business

But what if we don’t have a use case model specifically for your needs? Maybe your audio has a lot of background noise, accents, jargon, or product and company names; all of this can sometimes create problems for off-the-shelf models. If that’s the case for you, here at Deepgram we can customize a model for your specific use case. These tailored models can be trained and deployed within weeks and are specifically targeted to address the characteristics of your use case that might make it hard for an off-the-shelf model.

To make sure that the tailored model really does address your specific issues, the data for training these models requires audio from your specific business. The more “real world” audio from your business, the better the accuracy. Having an employee read off a script or list of terms creates poor data vs. recording your employee and customer having a conversation. Although we like to say that the more real-world audio you can provide, the better, we’ve seen good accuracy improvement with less than 10 hours of audio.

Deciding Which ASR Platform is Best for You

There are obviously a lot of factors that go into deciding which ASR system will work best for you, beyond the ability to tailor models. If you’d like to read more the factors that you should consider when shopping for an ASR platform, check out How to Evaluate an ASR Platform, or fill out our free Speech-to-Text Self Assessment.

Still have questions? Contact us to talk through your use case and see which of our models is best for you.

Related Resources

5 Reasons Amazon and Google are Losing Customers to Deepgram
Amazon and Google are often people’s first thought when it comes to speech recognition systems, but if you’ve ever tried to use these ASR tools, you know that the big name doesn’t help you get...
Deepgram Named High Performer by G2 for the Second Quarter Running
G2’s quarterly report for Voice Recognition Software is out, and we’re pleased to announce that we’ve been named a High Performer for the second quarter in a row! We’ve also risen to the number two...
State of Voice Tech 2022: New Report Highlights Biggest Voice Tech Adoption Motivators
In last year’s inaugural State of ASR report, we partnered with Opus Research to examine how companies of all sizes use voice technologies built on ASR (Automatic Speech Recognition) to drive efficiencies and productivity through...

Apply Now

Receive up to $100,000 to use over 12 months.

Become a Partner

When you become a partner you’re in good company.

Talk to Customer Success