TRANSCRIPT

Add live transcriptions to a daily video call with Deepgram

Hello there. My name is Kevin Lewis, and I’m a developer advocate at Deepgram. And today, I’m gonna show you how to get started with adding live transcriptions to a daily video call using the Daily Prebuilt UI. Now for those of you who might not know Daily, they allow developers to easily integrate video and audio calls into their applications. and we’ve recently partnered with them to bring fast and accurate live transcription to video calls in just a few lines of code. Now this project broadly has three parts. In the first part, we’re gonna just set up a Daily video call using Daily Prebuilt. Then in the second part, we’re going to enable live transcriptions at a domain level, an account level with Daily. And then finally, we’re gonna go ahead, get our video call and start live transcribing that call and putting the output in the web page. So let’s get started.

The first thing we’re gonna do is set up an HTML page with Daily JS. Also note this div with an ID of call. We’re going to set this up, so our generated kind of call visual elements are going to go here inside of the div. So the first thing we’re gonna do is get access to this element using JavaScript. Then we’re gonna go ahead and create a call frame. And in there, we’re going to pass this element and optionally provide some styling for the generated iframe. then we’re going to go ahead and join it. And here we’re going to provide the URL for a Daily video call.

Now we don’t currently have this URL, so let’s go ahead and create it. We can create it over here in the Daily dashboard. We’re going to create a room. Now, you can think of a room as an empty shell in which a call will take place. So first, we create the room and then users join that room in order to participate in the video call. So we’ll create the room with all the default settings. And if we go into the room here, we see there is a URL and that’s all we need to copy and paste. This is actually all we need to set up a multi person video call using Daily Prebuilt. So if I load this in the browser here, we see the element loads in. It asks us for access to our media devices. And once we have granted those, we can join the meeting. And if another user uses this exact same code with the same URL, They will also join this video call and it will become a multi person video call.

Now, the second part of this tutorial is to actually enable live transcriptions at a Daily domain level. Now in order to do this, we’re going to need to make an API request. you can make that API request anywhere you want directly from the terminal using something like curl or HTTpie using a programming language and writing a script and running it, or we can use a visual tool like Postman. and that’s what I’m gonna do today.

Now there are two things we need for this to be successful. The first is our daily API key. which you can get over here in your developer’s settings. So there is your API key. you also need a Deepgram API key, which you can get in the Deepgram console. So you can go to your project, generate a new API key with these set things. Hit create key and here is the API key you’re going to need. So going ahead to create that API request it’s going to be a post request to this URL. We’re going to pass in our daily API key inside of an authorization header. And then inside of the body, there’s a property called enable transcription and the value is Deepgram colon and your Deepgram API key. So if you hit send here, a successful response means we have successfully enabled live transcriptions at an account level.

Now there is one more thing you need to be aware of here. Only meeting owners can start transcription. And right now, we have not indicated when we are joining this call that we are a meeting owner. We’re just joining it as a non meeting owner. And in order to tell Daily, hey, we are a meeting owner. We are going to also provide what is known as a token. Once again, we’re going to need to make an API request in order to generate a token. You could also write a script you could do this on the fly as part of a wider application or you can do this from your terminal. But as we’ve got Postman open, we’re going to use this. So once again, there’s that authorization bearer here in the header. We’re doing a post request to slash meeting tokens and we’re specifying is owner is true. So we’re gonna hit send. We’re going to copy this token and pop it in here. And now when we join the call, we will be a meeting owner, and that means we are able to start transcription.

So let’s go ahead and move on to the final part of this tutorial, which is actually enabling transcription for an individual call and displaying the results. We’re gonna go ahead and create here just a couple of buttons. One called start one called stop, which will start and stop transcriptions, and we’ll go ahead and create a div with an idea of transcript. And this is where our output is going to lie. So when we click start, we want to start transcription. So we’re gonna go ahead and say whenever we click on the start element, the button, we’re gonna go ahead and call on the call frame, start transcription. Like, say, And that’s actually all we need to begin the transcription. We also need to listen for when transcriptions are returned from Deepgram. And we can do that with a cool frame on app message. So the app message event will be emitted whenever there is a new transcript returned.

So let’s go ahead and save this, go back to our web page, Hit refresh. We see those buttons are there. We’re gonna allow access to our media devices. We’re gonna join the meeting. And now that we’re in the meeting, we’re going to click start transcription. And in just a moment, we should see here new logs in the terminal, in the console. Wonderful. So if I open one of those and we just take a little look at this. We see here inside of data, we have a payload being returned from Deepgram. It also has the username that I’ve set inside of Daily. So the first time you load this, it’ll ask you for your username. So there’s the username. We see the text that I spoke and a few other properties. We also see from ID is set to transcription. And that’s because app message as an event can be triggered in a number of ways, and transcription is just one of those.

So we’re gonna wanna make sure that we’re only going to display transcripts on the web page when the from ID is trans description. And also only when the version of the transcript that is returned from Deepgram is the final version. So what we’ll do inside inside of here instead of just console logging message, we’re gonna go ahead and write an if statement, and we’re gonna say if from ID is transcription and if is final is set to true, that’s a Deepgram property we want to do something with this transcript that has been returned. And all we wanna do here is create a new paragraph element. We want to put some text inside of it. In this case, it will say, Kev, colon, and then the text that I spoke. and then we’re going to append to this transcript of a p tag. The the p tag with the text inside of it. So let’s hit save. Let’s once again refresh. We will allow access to our media devices. We will join the meeting, and then we will click start to transcription. And if we’ve done this correctly, we should now start seeing transcripts display on the webpage.

There we are. Making that a bit bigger there. So that is how you do live transcriptions with Daily Prebuilt. Now the final thing we’re gonna do before we say goodbye today is just get this stop transcription button to work just so you see how it works. and it’s really really similar to this. In fact, we can copy and paste this here. We’ll change it for the stop button. And instead of start transcription, it will be stop transcription.

And there you go. That is a Daily Prebuilt video call with live transcriptions from Deepgram. I hope you found this interesting and have a wonderful rest of your day. Bye for now.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo