Imagine having the ability to transcribe voice calls. Look no further because we’ll learn how to do that in this article by combining Vonage with Deepgram.
With Vonage, we can use one of their phone numbers to receive and record incoming calls and get a transcript using the Deepgram Speech Recognition API. We’ll use the Deepgram Python SDK in this example.
Here’s a snapshot of what we’ll see in the browser after making the phone call and using Deepgram voice-to-text.
Getting Started
Before we start, it’s essential to generate a Deepgram API key to use in our project. We can go to the Deepgram console. We'll make sure to copy it and keep it in a safe place, as we won’t be able to retrieve it again and will have to create a new one. In this tutorial, we’ll use Python 3.10, but Deepgram supports some earlier versions of Python.
Then we'll make sure to go to Vonage and sign up for an account. We’ll need to purchase a phone number with voice capabilities of type mobile.
We’ll also need two phones to make the outgoing call and another to receive a call.
In the project, we’ll use Ngrok, which provides a temporary URL that will act as the webhook in the application. Ngrok will forward requests to the application that is running locally. We can download it here.
Next, we'll make a directory anywhere we’d like.
Then we'll change into that directory so we can start adding things to it.
We’ll also need to set up a virtual environment to hold the project and its dependencies. We can read more about those here and how to create one. It’s recommended in Python to use a virtual environment so the project can be installed inside a container rather than installing it system-wide.
We need to ensure the virtual environment is activated because we’ll install dependencies inside. If the virtual environment is named venv, then we'll need to activate it.
We'll install the dependencies for the project by running the below pip installs from the terminal inside the virtual environment.
We now can open up an editor and create a file called deepgram-vonage-call.py.
The Code
Now to the fun part! Open the script called deepgram-vonage-call.py and add the following code to make sure the Flask application runs without errors:
We'll run the Flask application by typing this into the terminal python deepgram-vonage-call.py.
Then we'll pull up the browser window by going to http://127.0.0.1:5000/ and we should see the text Hello World.
At the same time the application is running, we'll open a new terminal window and type:
Here's a snapshot of the terminal running with ngrok:
We'll create a Vonage application in the Vonage API Dashboard by going to Applications -> Create a new application.
We'll give the application a friendly name that's meaningful and easy to remember. We'll call it Deepgram Vonage.
We'll also need to generate a private key by clicking the button Generate public and private key. Add the private key to the same level directory as the python deepgram-vonage-call.py file.
Next, under the section Capabilities toggle on the Voice option. We'll add the following webhooks, with the ngrok url and endpoints to the Answer URL and the Event URL. Please note that everyone has a different ngrok url.
We’ll implement the endpoints in a few.
Leave both terminals running as those are needed to run the application and receive the phone call.
Then we'll store the environment variables in a .env file with the following:
Replace DEEPGRAM_API_KEY with the API key we received from signing up in the Deepgram console, and the RECIPIENT_NUMBER is the phone number we would like to receive the call.
We'll replace the code in deepgram-vonage-call.py with the following:
Here we are importing the libraries and creating a new instance of a Flask application. Then we create a new database named calls. We are using a lightweight JSON database called PysonDB.
We create the /webhooks/answer endpoint, which allows us to make a voice call, connect to the Vonage number and record the call.
Next, in the /webhooks/recordings route below, we tap into Deepgram’s speech-to-text feature by getting the recording of the call and using speech recognition to transcribe the audio. We check if results is in the response and format it by using a list comprehension and storing the results in utterances. We then add the utterances to the calls database. We return an empty dictionary in the /webhooks/event endpoint.
We can see how the utterances will look after they’re formatted:
Lastly, we'll add the /transcribe route and a templates folder with an index.html file that will display the phone speech-to-text transcript.
In the Python file, we'll add the following code to get the voice-to-text transcript from the database and render them in the HTML template.
We'll create a folder in the project directory called templates and add an index.html file. In that file, we'll add the following HTML and Jinja code:
Then we'll loop through every transcript and display it on the screen.
Finally, we'll try making a phone call and using the non-Vonage phone to initiate a phone conversation with the phone number we provided in the environment variable VONAGE_NUMBER. We should be able to receive a call and engage in a conversation. After we hang up, the transcript will appear in the browser when we navigate to http://127.0.0.1:5000/transcribe.
Congratulations on building a speech-to-text Python project with Vonage and Deepgram! If you have any questions, please feel free to reach out to us on Twitter at @DeepgramAI.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.