What would you say if I told you that you could detect spoken conversational language using AI in a speech-to-text transcript with Python?
Would you spit your beer out?
Ok, maybe your water, but the point is I built a cool conversational AI project with an Interactive Voice Response (IVR) using Twilio, a speech recognition provider, and Python. The best part about it is that it was reasonably easy to build using Flask 2.0. The purpose was to identify the best virtual customer support agent to respond to a call.
I would love to walk you through the project, but if you want to skip ahead to the code, scroll to the bottom of this blog post.
Create Voice Recognition Phone IVR With Speech Recognition Using Twilio and Python
This project was my first attempt at building an IVR with AI in Python, so I researched how these interactive voice response systems work. Simply put, you can think of them as a tree with many branches. They allow you to interact with a system, like an automated phone customer support agent, before being connected or transferred to a representative.
For example, you may be prompted to press “2” on your phone to connect to a department and then “1” to speak to a live customer support agent. I’m sure we’ve all been in that situation.
Twilio is the best choice for building the IVR because of its easy-to-navigate dashboard and simplicity. Also, since I’m using Python, they have tons of tutorials on implementing IVR systems like the one in Flask I’m using for this tutorial.
I also needed a speech-to-text API and leveraged Deepgram. We have a Python SDK I tapped into that made it super quick and easy to get up and running with the voice recognition transcription.
Deepgram also has language detection with prerecorded audio in which you can detect over 30 supported languages like Hindi, Spanish, and Ukrainian, to name a few.
Let’s get to the meat of the project: the code.
Code Breakdown for Creating IVR Speech-to-Text With Language Detection Using Python
Imagine you had to build a Python application that detects different conversational languages. It would help if you rerouted phone calls from customers using an IVR system to the appropriate virtual customer agent who speaks their language.
The following Python code breakdown demonstrates how to do so. There are just a few things I had to set up before the coding started. It’s painless, I promise.
Grab a Deepgram API Key. I needed this to tap into the speech-to-text Python SDK.
Create a Twilio account and voice phone number here. This allowed me to make an outgoing call and navigate the IVR with dial prompts.
Install ngrok to test my webhooks locally.
Next, I made a new directory to hold all my Python files and activated a virtual environment to pip install all of my Python packages.
These are the packages I installed:
After creating my directory, I downloaded three audio files with different spoken languages from this website and added them to my project in a folder called languages.
I created a file called views.py that contains most of my Flask 2.0 Python code. You’ll see the entirety of this code at the bottom of this post, but I’ll walk through the most critical parts of it.
This code is where the Deepgram Python speech-to-text transcription magic happens. I’m transcribing the audio MP3 file and returning the transcript and detected language. The API detected the conversational language and provided a language code like es for Spanish.
At the top of the file, I created a Python dictionary that acts as a lookup. This dictionary contains the language code as a key and the name of the customer support agent that speaks that language as the value.
I created a POST route and prompted the user to press either 1,2, or 3, each for different languages. For example, if a customer presses 2 when they call in, they’ll get routed to the agent who speaks French.
Whichever option is selected will invoke a private function, as noted in the menu function. When option 2 is pressed, the function _french_recording is called.
I created a private function for each spoken language, and when they’re selected, that method will get called, and a phone response will say the message. For French, the automated IVR response will be `”This is the French response and Sally will help you.”`
I also created a templates folder in the main Python Flask project directory with a blank index.html file. We don’t need anything in this file but feel free to add any HTML or Jinja.
To run the application, I fired up two terminals simultaneously in Visual Studio Code, one to run my Flask application and another for ngrok. Both are important, and you’ll need the ngrok url to add to your Twilio dashboard.
To run the Flask application, I used this command from the terminal:
FLASK_APP=views.py FLASK_DEBUG=1 flask run allows my application to run in debug mode, so when changes are made to my code, there’s no need for me to keep stopping and starting the terminal.
In the other terminal window, I ran this command:
ngrok http 5000
Make sure to grab the ngrok url, which is different from the one in the Flask terminal. It looks something like this: https://3afb-104-6-9-133.ngrok.io
In the Twilio dashboard, click on Manage -> Active Numbers, then click on the purchased number. Put the ngrok url in the webhook with the following endpoint: https://3afb-104-6-9-133.ngrok.io/ivr/welcome, which is the unique ngrok url followed by the Flask route in the Python application /ivr/welcome.
Now, dial the Twilio number and follow the prompts, and you’ll get routed to the best customer agent to handle your call based on speech-to-text language detection!
Conclusion
Please let me know if you followed this tutorial or built your project using Python with Deepgram’s language detection. Please hop over to our Deepgram Github Discussions and send us a message.
The Python Flask Code for the IVR Speech-To-Text Application
My project structure:
views.py
view_helpers.py
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.