Get live speech transcriptions In your browser with Deepgram

In this video, Senior Developer Advocate Kevin Lewis shows you how to use Deepgram’s Speech Recognition API to get live captions directly in your browser.

Read the case study


Get live speech transcriptions In your browser with Deepgram

Hello there. My name is Kevin Lewis, and I’m a developer advocate here at Deepgram. And today, I’m gonna show you how to get started with live transcriptions directly in your browser using Deepgram’s speech recognition API.

This project has four steps. First of all, we’re going to request access and get data from the user’s microphone. Second, we are going to create a persistent two way connection with Deepgram that allows us to send and receive data in real time. Third, we’re going to get that data from our mic and send it to Deepgram as soon as it’s available. And then finally, we’re going to be listening out for live transcriptions being returned from Deepgram and show those to you in the browser console. So let’s get started.

The first thing we’re gonna do is ask for access to the user’s microphone. To do that, we’re gonna use this built in API in most browsers. We’re going to ask for access to a user’s media device, specifically an audio device, so a microphone. And this will return a promise, which in turn will resolve to what is known as a media stream. So let’s just console log that and see what a media stream looks like.

So here’s the page open in a browser. I’m gonna refresh on the first thing we see is that the browser handles requesting access to the microphone for us. And once we allow that, we see a media stream logged here. Now this is great, but in order to get raw data from the microphone, We need to plug this in to what is known as a media recorder. So we’ll create a media recorder here. New media recorder. And in there, we’re gonna plug in our stream, and we’re gonna specify the output format that we desire. So that’s step one.

Next, we’re going to create a persistent two way connection with Deepgram. We’ll create a new web socket here. and we’ll connect directly to Deepgram’s live transcription endpoint. In here, we’re also gonna wanna provide our authentication details There’s a few ways of doing it, but we are going to provide our API key directly here. Now we’re going to, as soon as that connection is opened, start preparing and sending data from our mic. And to do that, we’re gonna hook in to the socket dot on open event. like so. And in order to do this, we’re gonna add an event listener to the media recorder. So we’re gonna go media recorder dot add event listener. And the event we’re listening for is called data available, all lower case, all one word. That will return the data from our mic. and we’re gonna go ahead and send that data. So this is great. How do we make data available, we actually have to start the media recorder. That’s just one final line here, media recorder dot start. And then here we specify a time slice. So this is the increment of time in which data will be packaged up and made available. via the data available event. So this is in milliseconds, so that’s thousands of a second. So I’ll do this every quarter of a second. So that’s everything we need to send data to Deepgram.

The other side of that is to listen for messages that are being sent from Deepgram to us in the other direction. To do that, we’re gonna listen to the on message event. There’s loads of useful data that comes back in the returned payload. So here we pass it. instead of logging it all, we’re just extracting the transcript. And now we’re gonna go ahead and console lock the transcript. At this point, you may show it to users or do something else with it, but that is actually all we need in order to do live transcription in the browser.

So let me refresh, give access to our microphone, and we should see any minute now that transcripts are appearing right there in our console. How cool is that? And you’ll see there are multiple phrases coming for everything I’m saying. there is an additional property in the return payload that indicates when a given phrase is in its final form.

So hopefully, you found that interesting. That’s how you do browser live transcription. Before we part ways, I just wanted to mention a blog post that we publish not long before this video, which talks about best practices with handling your API key. So check the description out for that. And if you are gonna use this in the real, make sure that you are doing something to protect your API key from being accessible to users and having two wide reaching permissions. If you have any questions at all, reach out. We love to help people. We love to see what you’re gonna build with our speech recognition API. Have a wonderful day. Bye for now.