Article·Tutorials·Nov 22, 2021

Get Live Speech Transcriptions In Your Browser

Kevin Lewis
By Kevin Lewis
PublishedNov 22, 2021
UpdatedJun 13, 2024

There are so many projects you can build with Deepgram's streaming audio transcriptions. Today, we are going to get live transcriptions from a user's mic inside of your browser.

Watch this tutorial as a video:

Before We Start

For this project, you will need a Deepgram API Key - get one here. That's it in terms of dependencies - this project is entirely browser-based.

Create a new index.html file, open it in a code editor, and add the following boilerplate code:

Get User Microphone

You can request access to a user's media input devices (microphones and cameras) using a built in getUserMedia() method. If allowed by the user, it will return a MediaStream which we can then prepare to send to Deepgram. Inside of your <script> add the following:

Load your index.html file in your browser, and you should immediately receive a prompt to access your microphone. Grant it, and then look at the console in your developer tools.

Now we have a MediaStream we must provide it to a MediaRecorder which will prepare the data and, once available, emit it with a datavailable event:

We now have everything we need to send Deepgram.

Connect to Deepgram

To stream audio to Deepgram's Speech Recognition service, we must open a WebSocket connection and send data via it. First, establish the connection:

A reminder that this key is client-side and, therefore, your users can see it. Any user with access to your key can access the Deepgram APIs, which, in turn, may provide full account access. Refer to our post on protecting your API key with browser live transcription.

Then, log when socket onopenonmessageonclose, and onerror events are triggered:

Refresh your browser and watch the console. You should see the socket connection is opened and then closed. To keep the connection open, we must swiftly send some data once the connection is opened.

Sending Data to Deepgram

Inside of the socket.onopen function send data to Deepgram in 250ms increments:

Deepgram isn't fussy about the timeslice you provide (here it's 250ms), but bear in mind that the bigger this number is, the longer between words being spoken and it being sent, slowing down your transcription. 100-250 is ideal.

Take a look at your console now while speaking into your mic - you should be seeing data come back from Deepgram!

Handling the Deepgram Response

Inside of the socket.onmessage function parse the data sent from Deepgram, pull out the transcript only, and determine if it's the final transcript for that phrase ("utterance"):

You may have noticed that for each phrase, you have received several messages from Deepgram - each growing by a word (for example "hello", "hello how", "hello how are", etc). Deepgram will send you back data as each word is transcribed, which is great for getting a speedy response. For this simple project, we will only show the final version of each utterance which is denoted by an is_final property in the response.

To neaten this up, remove the console.log({ event: 'onmessage', message }) from this function, and then test your code again.

That's it! That's the project. Before we wrap up, let's give the user some indication of progress in the web page itself.

Showing Status & Progress In Browser

Change the text inside of <p id="status"> to 'Not Connected'. Then, at the top of your socket.onopen function add this line:

Remove the text inside of <p id="transcript">. Where you are logging the transcript in your socket.onmessage function add this line:

Try your project once more, and your web page should show you when you're connected and what words you have spoken, thanks to Deepgram's Speech Recognition.

The full code is here:

If you have any questions, please feel free to reach out on Twitter - we're @DeepgramAI.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.