A funny thing happened when Deepgram first decided to use end-to-end deep learning (E2EDL) to design our next-generation speech-to-text (STT) solution. We found that this approach was hugely flexible and easier to optimize than traditional STT. We didnโt have to reconnect and optimize multiple models (acoustic, pronunciation, and language) every time we wanted to make a change. And we could retrain and enhance our speech models without starting from scratch.
With transfer learning, we could build new speech models faster. This trait of our technology has allowed us to build different base speech models for different use cases and needs. It also allows us to tailor models in cases where a customer needs something specific that we donโt currently offer.
Letโs take a look at the two types of models that we offer here at Deepgram and what each is good for.
1. Language-by-Use Case Models
All of our use case-specific models are available in various English dialects. We are expanding into different language-by-use case combinations as we continue to train and optimize our speech models for specific circumstances, such as call centers or meeting transcription, as well as expanding the spoken languages we offer. Our customers have found that combining a spoken language and use case to create a speech model that works specifically for their needs is more accurate than Big Techโs out-of-the-box, one-size-fits-none models.
These targeted models have the fastest speed and are optimized for the best scalability. Our models can transcribe one hour of pre-recorded audio in 30 seconds.ย These models are great for all applications, especially ones that need very high speeds or cost savings for on-prem use. You also donโt need to trade off speed or scalability for high accuracy and because we have multiple models for different use casesโunlike Big Techโour models tend to be more accurate as well.
How to Evaluate a Deep Learning ASR Platform
Get the information you need about 1st generation, 2nd generation, and modern-day automatic speech recognition (ASR) solutions to ensure your evaluation experience is efficient and yields the data you need to make your purchasing decision.
2. Higher Accuracy Enhanced Models
We also built our next-generation architecture with the highest English language accuracy on long-tail words or words that are not as common in regular conversations. This new architecture was rebuilt from our current architecture to optimize accuracy on more words.ย
This new enhanced speech model architecture is best suited where you have keywords and terms that you must get correct but are not in normal conversations; like fiduciary, biodiversity, formulae. Some use cases can be Conversational AI for B2B, technical support contact centers, or technical meetings or seminars.
3. Models Tailored for Your Business
But what if we donโt have a use case model specifically for your needs? Maybe your audio has a lot of background noise, accents, jargon, or product and company names; all of this can sometimes create problems for off-the-shelf models. If thatโs the case for you, here at Deepgram we can customize a model for your specific use case. These tailored models can be trained and deployed within weeks and are specifically targeted to address the characteristics of your use case that might make it hard for an off-the-shelf model.
To make sure that the tailored model really does address your specific issues, the data for training these models requires audio from your specific business. The more โreal worldโ audio from your business, the better the accuracy. Having an employee read off a script or list of terms creates poor data vs. recording your employee and customer having a conversation. Although we like to say that the more real-world audio you can provide, the better, weโve seen good accuracy improvement with less than 10 hours of audio.
Deciding Which ASR Platform is Best for You
There are obviously a lot of factors that go into deciding which ASR system will work best for you, beyond the ability to tailor models. If youโd like to read more the factors that you should consider when shopping for an ASR platform, check out How to Evaluate an ASR Platform, or fill out our free Speech-to-Text Self Assessment.
Still have questions? Contact us to talk through your use case and see which of our models is best for you.