Imagine being able to search phrases in every call you've had to find exactly what was said. That's the dream, right? Well - it's my dream, and it's what we're building today.
Users can call a Twilio phone number, which will be forwarded to your agent while recording. Once the call is completed, we will get a transcript using Deepgram's Speech Recognition API and make it available through a searchable dashboard we'll put together with Vue.js. The front-end is super light, so you could build it with any framework (or none at all).
If you want to look at the final project code, you can find it at https://github.com/deepgram-devs/twilio-voice-searchable-log.
Before We Start
You will need:
Node.js installed on your machine - download it here.
A Deepgram API Key - get one here.
A Twilio Account SID and Auth Token and a Twilio phone number - get one here.
Use of two phones to test your project - one to make the call and one to receive.
Create a new directory and navigate to it with your terminal. Run npm init -y to create a package.json file and then install the following packages:
Create a .env file, open it in your code editor, and populate it with your credentials and settings:
You should not share this .env file as it contains sensitive credentials. If you are using git version control, make sure to ignore this file.
Create an index.js file, and open it in your code editor.
Preparing Dependencies
At the top of your file require these packages:
Create a new nedb database. This will either load an existing file, or create one if it doesn't exist:
Initialize the Twilio Helper library and Deepgram Node SDK:
Creating An Express Server
Initialize an express instance, configure body-parser and a public directory, and start it at port 3000:
Project Overview
When a user rings our Twilio number, our application will be sent a POST request with a bunch of information - we will grab their phone number and the call identifier and add it to our database. We will instruct our application to forward the call to our real number and begin recording. Once the call is completed (or after the default one-hour limit), the recording will be saved, and, once ready, Twilio will send recording information back to our application.
We send the recording URL to Deepgram and receive a transcript in return. The recording details also contain the call identifier, so we can add the recording URL and the transcript to the existing database entry.
Once we have data in a database - the final step is to build the web dashboard with client-side search, which we will do towards the end of this blog post.
Configure Your Twilio Number
When Twilio sends data to an application, it expects a publicly-accessible URL, but our application is only on our local machine. Fortunately, we can use ngrok to provide a temporary public URL that will forward requests to our locally-running application. Open a new terminal and run the following:
Once running, you will see a Ngrok URL in the terminal dashboard which we can provide to Twilio. If you restart your terminal, ngrok, or after 24 hours, the URL will change, and you need to complete the next step again.
Navigate to your phone number's settings in the Twilio Console, and configure incoming calls to send a HTTP POST request to your ngrok URL followed by /inbound.
Handle Inbound Calls
In your index.js file add a new route handler for receiving Twilio calls:
The dial variable is the most critical bit here - we tell Twilio to record the call once it is answered and to record each speaker in their own channel. This isn't required but will lead to higher-quality transcripts later. Once the recording is ready, we tell Twilio to send a POST request with data to /recordings. Finally, we tell Twilio to forward this call to the number in our .env file - perhaps a reception or sales rep.
Before we test this, create an empty route handler for recordings:
Test your code! Run with node index.js while making sure ngrok is still running. Call your Twilio number from a phone number that is different from the one in .env and your forwarding number should ring. Answer, speak, hang up, and you should see 'Recording received' in your terminal a few seconds later.
Generate and Save Transcripts
Replace your /recordings route handler:
Twilio sends a body of data along with this request - we only care about the call identifier and the URL of the recording, so we pull those out by destcructuring the object.
Then, we ask Deepgram for a transcript. Let's recap the features we use:
The punctuate feature adds punctuation for easier reading.
The utterances feature will return the spoken phrases rather than just words, and specifies the user who spoke them.
The model feature lets us specify a model to use. We have a few, and the default model would probably perform quite well, but we specifically have one trained for phone calls, so we specify it here.
The multichannel feature tells Deepgram that we are sending an audio file with multiple audio channels (one for each speaker). It improves transcription quality as we can accurately identify who is speaking at any point in time.
Once we provide the recording URL and receive a response from Deepgram, we format the response. The output is an array that looks like this:
Finally, we update the existing database entry for this call to include both the recording URL and the utterances spoken.
Test your code! Repeat the previous test steps, and you should see the database file (calls.db) now containing the new data.
Get All Recordings
The final step to the server-side of this project is to build a route handler that will return all calls which have a transcript included:
This route handler will also sort results, so it's the latest-first.
Receive Data In The Browser
Our search will consist of three parts:
A search box which, when typed in, will fuzzy search all call transcripts fetched from the database. We will use Fuse.js for the fuzzy search.
An updating list of results which will show the phrases that matched the earch term.
When a result is clicked, a full transcript of the call.
Create a public directory and an index.html file inside of it. Open the file and scaffold a page:
Our first goal is to populate data.calls.all with the items from the database returned from the GET /recordings route handler, and then create a Fuse search index with those items. Inside the created() method:
Restart your server, navigate to http://localhost:3000 in your browser. Open the DevTools Console and type app.calls.all to see if the value was populated correctly:
Search Transcripts
Inside of your <div id="app"> add a text input:
Then, inside of the empty search() method conduct a search with Fuse and assign the result to data.filter.results:
Now we have results being computed, let's display them right under the <input>:
Show Full Transcript
When an search result is clicked, let's show the whole transcript. Add this attribute to the end of the <li> element - it will set data.calls.selected to the clicked item:
Then, below the <ol> add another <ol> to show the utterances in the transcript:
With the predictability of multichannel audio from Twilio, the receiver of the forwarded call is always on channel 0, so we can display in nicer terms who is speaking.
Finally, add the following CSS to the <style> tag in the <head>:
Refresh and you should see the finished project:
Wrapping Up
We've done quite a lot here - how to forward and record calls with Twilio, how to handle recording payloads to get transcripts, and implement basic client-side fuzzy search. There's a lot more we could do to expand on this project:
Implement outbound call recording
Group calls under specific callers
Allow users to make notes for calls or callers
Use a more permanent database like Supabase - Brian wrote a getting started post here
The complete project is available at https://github.com/deepgram-devs/twilio-voice-searchable-log and if you have any questions please feel free to reach out on Twitter - we're @DeepgramAI.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.