Your recordings may not always be great quality - there might be a graininess to them, or background noise that interferes with what the listener is trying to focus on. While Deepgram may still perform fairly well, it's always true that better source audio results in a higher chance for accurate transcripts. For this tutorial, I'll use a low-quality audio file from the Library of Congress.
An excellent tool for improving the quality of audio is Dolby.io's Media Enhance API. With this API, all I have to do is make a POST request with the audio file, and Dolby.io can analyze it to remove the interfering sounds and the crackling or static you often hear with these types of recordings. I can even specify what type of content the audio is, such as an interview, podcast, or voice recording, and Dolby.io can enhance it even further for that type of content.
Before We Start
Before jumping into coding, I'll be sure to get an API key from each of the APIs I'll be using today. I'll head to Dolby.io and then Deepgram.com to get keys.
I'll also install the dependencies with the following command in my project directory:
Create a Node.js Project
I'll create an index.js file and require axios to help with making API requests:
I intend to send an audio file to Dolby.io for enhancement, wait for it to be processed, and wait for that file to come back to me. Since there will be an unknown amount of time involved in each step of the process, I need to write asynchronous functions for each step. Here are the steps:
Start the Enhance Job
The first asynchronous function will be called startEnhanceJob
I need to make the audio file available to Dolby.io by putting it in cloud storage. Dolby offers the option of me putting it in their temporary cloud storage, but I have to use the URL format they expect, which will start with dlb://. I'll write some JavaScript to create that Dolby.io URL format:
Then I will make the POST request with the audio file to Dolby.io and receive a job ID for that enhance job (which I'll need in the next step).
Notice that I added content: { type: interview } since the audio file I'm sending is an interview.
Check the Enhance Job and Report Progress
It will take some amount of time for the enhancement job to run. I need to track the progress so that I know when the file is ready. I'll write two functions in this step: checkEnhanceJob and waitUntilJobCompletes.
For checkEnhanceJob, I'll take the job ID that was returned from the startEnhanceJob function, and I'll use it to make a GET request to the Dolby.io Enhance API to get progress on the enhancement job:
Then I'll write a function that uses the checkEnhanceJob result in a loop to show a countdown as progress is being made on the enhance job. It will wait 2000ms (2 seconds) between each loop:
Get Enhanced File URL
Once the enhancement of the file is complete, I need to output that new file to a URL that I can use (in this project, I'll be using it to pass on to Deepgram for transcription).
I'll write a function that will make a POST request to put the output of the enhance job at the URL I created for temporary storage of the file. I'll also console.log the file URL so I can test it now and see how it sounds.
Run the Enhance Logic
I wrote each step of the enhancement job, but now I need to write a main function that runs every step, i.e., every function I wrote.
I also need to add the URL of the audio file I want to enhance. I've chose a file from the Library of Congress called "Interview with Lillie Haws, New York, New York, November 12, 2001".
Here is the main function:
When it runs, I'll see the values of my loop that I printed counting up to completion of the enhancement job. And when it finishes, I'll see a very long URL that I can use to listen to my file.
If I click on the link, I'm taken to the hosted audio file. It sounds so much better than the original! Now I'm ready to transcribe it with Deepgram.
Transcribe With Deepgram
I'll be using Deepgram's API for transcribing Pre-Recorded Audio. Deepgram has a Node.js SDK, so I'll require it in my index.js file. I'll also create a new instance of Deepgram by giving it my Deepgram API key:
I will take the file URL that I received from Dolby.io and send that to Deepgram for transcription. It is the temporarily stored file that I assigned to the url variable in the main function (in the last section).
I'll also specify that I would like Deepgram to add punctuation. I can do this by adding { punctuate:true } to the request:
Now I can run the whole function, and I'll see that Deepgram transcribes the enhanced file. I'll console.log the response from Deepgram so I can actually see the transcription now:
And now I have a full transcript of the audio file from the Library of Congress.
Conclusion
Today I used Dolby.io and Deepgram to enhance an audio file and transcribe the speech of the interview into text. These two APIs seem like a great combination for many future projects!
If you enjoyed my post, follow me at Twitter to continue the conversation.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.