Article·AI & Engineering·Apr 7, 2021

Generic ASR will never be accurate enough for Conversational AI

Table of Contents
Share this guide
Keith Lam
By Keith Lam
PublishedApr 7, 2021
UpdatedJun 13, 2024
Table of Contents

The human brain is amazing in terms of how we can process speech and understand what is said. If we are talking about a baseball game, your brain understands that when I say "pitcher" and "batter", I don't mean a large vessel for pouring drinks and a mix to cook pancakes. Your brain matches the words to the context and the intent of the conversation. Your brain also has an amazing noise filter to focus on the important parts of a conversation. If you are at a baseball game, there is constant noise around you but when your buddy talks to you, you can focus on his voice, hear him and understand him clearly.

Intent Matters

How does a Conversational AI system determine the intent of the conversation and focus on the important words? Let's talk about a possible future Conversational AI example. Imagine a robot waiter at a local pub. There are four conversations going on around it. The booth to its left is talking about a weird internet video. A table behind it is complaining about the last place the group ate and how bad the chicken was. And finally, the table in front of it has delegated the task of ordering appetizers to the person at the back of the table, with everyone throwing their requests their way. Given a one hundred percent accurate transcript of audible conversation at the table it would be really hard for the robot to understand what should be happening here. Did they just order chicken tenders or was that the other table? Was that two orders of the appetizer or was that first person asking the other person to order it? Was that 'mh-uh' a no they don't want the biggie sized version or was it just a throat clearing?

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.