It’s never been easier to build a conversational AI bot for whatever use case you have. Handling basic customer support questions? Doable. Scheduling appointments for a doctor’s office? Sure thing. Checking in on the shipping status of e-commerce orders? Easy-peasy.
Basically, if you can imagine a routine interaction between a customer and an agent that can be mapped out in a flow chart, there’s an opportunity to automate those situations with the help of conversational AI. If nothing else, it’ll free up the customer-facing people on your team to handle the issues which require human input.
Let’s say you want to build a conversational AI of your own but are not sure where to start. In this guide, we’ll lay out the basics of building conversational AI experiences and how to launch it to the world.
Let's discuss this using a simple and universal use case: Pizza! Here’s a voice assistant to order a pizza from DG Pizza.
Let's get familiar with some of the terminology that’s commonly used to help us start thinking about the structure of the sentences:
Wake word - what the bot listens to (E.g ‘Hello DG’)
Utterance - what the user says (E.g. ‘I want a veggie pizza’)
Intent - what the user wants to accomplish (E.g. Order pizza)
Slot - information needed to complete the intent (E.g. small, veggie)
Prompt - what the bot says (E.g. ‘Your order for a small veggie pizza is placed’)
The Building Blocks of Bot Design
As we alluded to above, the design of conversational AI bots is really in the flow of conversation, but there are some basic things to consider before you start laying out your bot’s interaction model.
Modality: To Speak, or Not to Speak?
This is probably the most important design decision you’ll have to make right out at the outset: How are people going to interact with your conversational AI bot? Is this a text-based agent, or do you expect users to use their voice?
If you want your bot to interact with voice, you’re going to need to make some decisions about which automatic speech recognition (ASR) model to use, and you’ll need to figure out how you’d like the bot to “speak” with the user. We’re kind of biased here, but Deepgram is a great choice for fast, accurate ASR for conversational AI, which is what you need for delivering a seamless voice AI experience. And as for decent text-to-speech systems to drive the bot’s “voice,” there are several options, including Murf, Speechify, and Notevibes all of which have free versions to get you started.
If you’re looking to build a text-only experience, there are several other conversational AI platforms which could be suited to your needs. You might want to check out the documentation for Intercom, Drift, or Boost.ai to see what works best for you.
This includes the words that it says, the voice itself, and the behavior. This needs to match the bot and the expectation of the user. If the bot is going to be casual and make jokes, you must ensure that your audience can interpret that appropriately.
I have an Indian accent. When voice bots initially came out, it was a frustrating experience interacting with them because I had to modify my accent so that the bot could understand me. This actually led me to avoid using them for a long time. It’s important to ensure voice bots understand your target audience’s accent so that they can feel heard and understood.
Start with a script of a successful interaction to act as a base to build on top of. A script is a familiar and easy enough mode to help us just get something started:
Customer: Hello DG
Bot: Hello! Which pizza would you like today? Veggie, Pepperoni, or Cheese?
Bot: What size would you like, 8’’, 12’’, or 16’’?
Bot: Thank you, would you like anything else?
Bot: Your order for 1 12’’ Veggie Pizza has been placed. You will receive it in 1 hour.
Psst… Pro tip from user experience leader Corbet Fawcett: Skip lengthy welcomes, people know what they’re here for and it keeps them more in the flow to be interacting sooner!
Design Your Conversational Flow
In this section, we will discuss the steps involved in building a conversational AI experience for ordering pizza at Deepgram’s newest spinoff, DG Pizza. For our purposes here we’ll assume that DG Pizza carries three varieties of pizza, available in three sizes of pie.
Flow Chart Options
You could prototyping tools that you are already comfortable with like Figma or draw.io to build your flow chart. Once the flows start to get more complicated you might have to use more specialized tools to help you build your flows like Voiceflow or Fabble.io.
Choose Your Words
People don’t always say the same words to accomplish the same task. The same intent could have multiple paths and multiple utterances.
For example, somebody that wants to order a pizza could start by saying ‘Hello DG’ or could directly say “Hello DG, can I order an 8-inch Veggie pizza.” You could also say “Hey DG, cook me some Za’s!”
Always test your flows with real users to uncover paths that you might not have thought of.
The bot needs to be able to figure out which slots it has and which slots are pending information and direct the conversation based on that. So, if the user says "I want a pizza," the remaining slots are “8-inch” and “veggie.”
Lastly, keep prompts to small units. Take this as an example:
“Hello! Would you like to order a pizza? We have veggie, cheese, and pepperoni.”
“Hello! Welcome to DG! Would you like to order a pizza? We have veggie, cheese, and pepperoni and they’re available in 3 sizes”
In design, an “affordance” is a feature or aspect of an object which indicates how the object is to be used. You might have seen that video from Vox featuring design luminary Don Norman discussing the design of physical doors: push plates are an unambiguous affordance indicating that the door is to be pushed outward. Voice experiences perform best when they provide affordances—in the form of an explicit, constrained vocabulary and clear paths through (and out of) the workflow—and it’s up to you to build those into your conversational AI bot.
Make it easy for the user to know what to say. In the words of UX consultant and author Steve Krug, “Don’t make me think!.” Every business has its own terminology, try to stick with a) what’s commonly used in the industry b) what most people are familiar with c) what’s simplest to understand in the medium that you have chosen. Something that is easy to understand in a visual medium might not translate the same way in a voice medium.
Applied to our example: You'd want to specify the available pizza sizes available instead of letting the user guess if they should be saying it in inches or t-shirt sizes (s,m,l). To whatever extent possible, your dialog flow should take the guesswork out of interactions. In our pizza-ordering example, our bot responding with “Of course, what size pizza would you like? 8-inch, 12-inch, or 16-inch?” is way more explicit than asking “What size pizza would you like?”
Cover Your Corner Cases
The easiest part of most projects is building the happy path. It can be daunting to try to figure out all of the error scenarios, but make sure to cover the basic ones to start with. Conduct user testing to uncover scenarios that might not have been thought of and incorporate them as you go along.
In the case of our pizza-ordering interface, you'd want to consider what happens if a user asks for pizzas that aren't on the menu? What happens if a user doesn’t ask about pizza at all?
You can avoid infinite loops and other errors by adding additional context or rephrasing where possible if you continue to get error inputs, so you can continue to direct the user. In the best-case scenario, the user won’t even realize that they’re in an error state.
Design for real people and conversations
In a voice medium, it is important to understand the user's environment. Try to map out the situation in which the user would be using this bot and design around that. We’ve all been to those drive-thrus where we’re yelling from the car or we can’t understand a word of what is being said through the speaker. In these cases, the hardware is just as important as the software.
In our example of ordering pizza, you might have to design for the user to take time to make a decision as they consider their options. They could also be discussing the options with their friends, will the bot know how to distinguish that from the conversation with the bot?
Take Your Chatbot to the Next Level
As the use of your bot grows, you can do a lot more with your chatbot experience to take it to the next level. Leverage other technologies to provide a more rounded experience to the user.
A sole text, visual or voice bot, will only take you so far. Make use of the advantages that these modes have and combine them in a way that makes it as easy as possible for the user to complete their task.
Back to our pizza example: The DG pizzeria has a very limited menu for our bot to handle... for now. Once it expands, you must find creative ways to break up the menu or use a combination of voice and screen interfaces. Provide an option to text the user a link with the full menu so they can look at it to simulate a restaurant ordering experience.
As much as we would like to automate all processes, if a user is frustrated, their instinct is to want to talk to a human. They might also abandon the journey at this point. Identify scenarios when the user is unable to get what they want from the bot and have a strategy where you can transfer them to a person to help out with their needs.
Analyze Conversations and Create Feedback Loops
Looking at and studying real conversations will help you further shape your bot. There are various lenses you can look at your data with to help study a large amount of data easily.
Sentiment analysis: Gauge the sentiment of the user while using your bot. Narrow down conversations that left the user with a negative sentiment and identify patterns to see if you can make improvements on those conversations. You can also study conversations with positive sentiment to understand what is causing that emotion and try to implement that in other places as well.
Topic Detection: Identify topics that come up often in your conversations. These could be common complaints that arise that need to be addressed or could be understanding what popular requests are from users. Either way, study what comes up most in these conversations.
Summaries: Summarize your calls and get a quick overview of what the conversation was about. This will save you precious time as opposed to going through the full conversation and will help you identify any issues quickly.
If this guide was helpful to you we hope you consider implementing some of these lessons in your own chatbot-building experience.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .