Case Study

Elerian AI builds human-like AI-powered voicebots with Deepgram

Our fast, accurate Automatic Speech Recognition (ASR) engine allows Elerian AI to create a seamless, natural-sounding conversation with a digital agent.

The Landscape

Delivering Great Caller Experiences

No caller enjoys the experience of trying to talk to an automated customer support line and being told again and again by an eerily mechanical voice “I’m sorry, I didn’t catch that, could you repeat it?”

Customers of call centers using IVR-based call routing or outdated AI are regularly frustrated by a bot that has gotten stuck in a loop, or a bot sending them through the same cycle of limited options over and over due to misunderstandings and an inability to ask clarifying questions. If an AI is going to respond in a human way, it has to hear, remember, and understand in the way a human would. This of course means building better speech recognition and language understanding AI.

AI can increase the productivity of human agents by automating simple request calls like “What is my balance?” or “How do I access the app on my laptop?” An added bonus is that an AI also does not lose its patience or act rudely. With sentiment detection, an AI-powered bot can detect the customer’s emotional state and respond in an appropriate and personalized manner.

So if Elerian AI could build a Digital Agent that sounds and feels totally natural, they could provide a consistently excellent customer experience on par (and in some circumstances ahead of) what a human agent can provide. This frees up human agents to focus on the more challenging aspects of support or sales.

The Challenge

Without Domain Accuracy, There is No Product

A machine can’t be taught to understand a word it doesn’t know, right? To build a machine that truly understands humans, it first has to be able to hear and transcribe the words humans say. So the first step to building a machine that can understand and respond to humans is building an accurate automatic speech recognition (ASR) engine. This is in and of itself a monumental task. The ASR engine that powers a Natural Language Understanding AI has to be accurate and it has to be fast so the Digital Agent can respond to the caller in real-time. It also has to be able to handle a wide array of accents and have a domain-specific vocabulary and context. For example, when it hears “bite” in a computing company, it knows that word is “byte”.

Elerian AI recognized the complexity of building an ASR engine for their customers where callers have a variety of dialects and accents. This was particularly evident in South Africa, where there are 11 official spoken languages. General ASR engines available in the market simply could not handle this complexity, nor recognize the requisite domain-specific speech entities critical to the contact center use cases, such as social security numbers and other identifiers. According to Craig Akal, Co-founder of Elerian AI, a suitable ASR solution is critical for the Elerian voicebot.

The Solution

Tailored, Trained Speech Models Enable Support of Accents and Dialects of All Callers

Deepgram’s trained and tailored speech models along with Elerian AI’s entity recognition models allow for speech to text precision transcription – at greater than 90% accuracy.

For Elerian AI, a general ASR speech model could not handle the various dialects, accents, jargon, and slang that they heard on phone calls, and they needed a more custom domain-specific model. Deepgram was the only solution that could provide a trained domain-specific model that would meet their stringent requirements. And Deepgram’s real-time transcripts are produced with less than a 300-millisecond lag, which is another mission-critical factor for a human-like conversational AI bot. Lastly, Deepgram can deploy its ASR solution on-premises for better security. This means sensitive customer data stays out of the cloud. For Elerian’s banking and financial services customers, this is crucial.

Machines that are Fluent in Human

The partnership between Deepgram and Elerian AI has allowed a Conversational AI digital agent to be more human-like and has increased the productivity of the human agents in their client’s call centers. Elerian AI CEO, Dion Millson says, “For Conversational AI voicebots, it all starts off with speech recognition. If you don’t understand what the person said and transcribe it to text accurately, you are not in the game. Unfortunately, the general ASR models standardize around 70% accuracy, and it is just not good enough to respond to a caller with real-time accuracy and relevance. Our partnership with Deepgram and their models in conjunction with our internal models that are trained on case-specific data get well over 90% accuracy.” Elerian’s NLU driven digital agents come with not only the ability to understand what is being said but with sentiment and emotion detection tools to provide rich analytics on caller interactions and more personalized interaction. Combine that with their simple, low code configuration process, and you have a breakthrough solution unmatched in the Conversational AI industry.

Technology You Can Talk to About Anything – Empowering Human Decision-Making

Elerian AI not only set out to dominate the Call Center Conversational AI space with their NLU technology; they also sought to create AI that could understand people and engage in natural conversations of a more general kind. This has always been their mission.

According to Craig Akal, that’s what has given Elerian AI the flexibility and the vision to break free from the rut which has limited the advancement of voicebots for so long. “Our competitors have come out of the call center space, and really they’ve built on top of rudimentary products to try and get to something that can interact naturally. We’ve come at it the other way. Elerian AI is striving to achieve genuine open-domain conversation capability, which is the ability to talk to technology about any topic. We have built AI that develops human models to understand people individually and support their decision-making” noted Akal. Call centers are simply the first point through which Elerian AI has chosen to commercialize. Their technology has near limitless application across a wide array of use cases and industries.

Elerian AI is passionate, confident, and determined to push their technological solutions as far as they can go. “In a call center context, we don’t see ourselves as benchmarking with other technologies but rather with the experience offered by a great human operator. That is the level of sophistication we want to deliver and the benchmark against which we compare ourselves. We’ve got to provide a great caller experience, every time.” In short, Elerian AI is building a machine that can talk to you in a way that feels human. It may seem like the stuff of futuristic sci-fi, but for Elerian AI’s customers, the future has already arrived.