You are not alone if you have ever experienced a poor interaction with a voicebot. Perhaps you heard, "Sorry, I didn't get that. Can you repeat your request?" or have been transferred to a human agent after a simple question. Unfortunately, these experiences are common across various industries, resulting in a negative customer experience and ultimately cause churn. When you look at the leading edge of Conversational AI, what is currently possible? Well, we are getting much closer to having voicebots that converse like humans, but only on specific subjects or use cases. There is still not a voicebot that you can universally converse on everything or be a real personal assistant.
The need to be persistent
So what are the obstacles to human-like conversational AI? Many technical and data obstacles exist to reach this goal. Kevin Fredrick, Managing Partner of OneReach.ai, expressed this best when he said, "Building a Conversational AI voicebot is like planning to summit a mountain. Those who are looking for an 'easy button' get frustrated and quit. The ones who think it will be too hard, don't ever start. It is the ones who know the challenge is worth it and have the right partners and use the right tools who make the summit." There are still technical challenges we continue to overcome including transcription speed and accuracy optimization, better Natural Language Processing (NLP) and Natural Language Understanding (NLU), improved human-like text-to-speech engines, and tighter integrations between all the parts of this workflow, but we see the path to reaching this summit.
Lack of training data
On the data side, Antonio Valderrabanos, CEO of Bitext, indicated that the availability of data for AI model training and evaluation is one of the main challenges for creating a voicebot for all languages, accents, dialects, and use cases. Do we have the audio and text data in these accents, dialects, and use cases to train that AI model? This data currently does not exist in the public domain or even in the private domain, so this training data must be built in a scalable way. Valderrabanos believes we can get there but there needs to be automated methods for data generation, for both training and evaluation.
Why is it harder to create voicebots vs. chatbots?
As Adam Sypniewski, CTO of Deepgram noted, there is no plug-and-play with voicebots. You can't just unplug the chat and install automatic speech recognition into the Conversational AI workflow and expect it to work. Texting and chatting can be looked at as one-dimensional while speech is multidimensional. You have different tones of voice meaning different things. You have pauses in a conversation, which may or may not mean you are done speaking. You have different words that all mean the same thing; like "Yes", "Yeah", "Sure" or "Uh-huh". Wait, did he say "Uh-huh" as meaning non-commital or as meaning affirmative? This is just English. What about English as a second language speaker accents or different languages with different expressions? Simple transcripts from an automatic speech recognition (ASR) system will not pick up these differences and they may not get the words correct. Yeah, there is no easy button but these cutting-edge companies are climbing that mountain.
Light at the end of the tunnel
Yes, there is no easy button to a great overall Conversational AI voicebot but very good single-use case voicebots are available now. Our experts all agree we are going to see a big evolution in this space in the next two to three years that will evolve to that universal voicebot or personal assistant that you can speak with as a friend.
Want to hear from the experts
Our on-demand webinar with Bitext, OneReach.ai, and Deepgram discusses why customers are rejecting simple IVRs, chatbots, and menu-driven voicebots to embrace more human-like bots that understand the customer intent and respond correctly to meet customer needs. They also discuss what you need to consider in creating that great voicebot. View the on-demand webinar.