Article·Tutorials·Nov 15, 2021

Generate WebVTT and SRT Captions Automatically with Node.js

Kevin Lewis
By Kevin Lewis
PublishedNov 15, 2021
UpdatedJun 13, 2024

Providing captions for audio and video isn't just a nice-to-have - it's critical for accessibility. While this isn't specifically an accessibility post, I wanted to start by sharing Microsoft's Inclusive Toolkit. Something I hadn't considered before reading this was the impact of situational limitations. To learn more, jump to Section 3 of the toolkit - "Solve for one, extend to many". Having a young (read "loud") child, I've become even more aware of where captions are available, and if they aren't, I simply can't watch something with her around.

There are two common and similar caption formats we are going to generate today - WebVTT and SRT. A WebVTT file looks like this:

And a SRT file looks like this:

Both are very similar in their basic forms, except for the millisecond separator being . in WebVTT and , in SRT. In this post, we will generate them manually from a Deepgram transcription result to see the technique, and then use the brand new Node.js SDK methods (available from v1.1.0) to make it even easier.

Before We Start

You will need:

  • Node.js installed on your machine - download it here.

  • A Deepgram API Key - get one here.

  • A hosted audio file URL to transcribe - you can use https://static.deepgram.com/examples/deep-learning-podcast-clip.wav if you don't have one.

Create a new directory and navigate to it with your terminal. Run npm init -y to create a package.json file and then install the Deepgram Node.js SDK with npm install @deepgram/sdk.

Set Up Dependencies

Create an index.js file, open it in your code editor, and require then initialize the dependencies:

Get Transcript

To be given timestamps of phrases to include in our caption files, you need to ask Deepgram to include utterances (a chain of words or, more simply, a phrase).

Create a Write Stream

Once you open a writable stream, you can insert text directly into your file. When you do this, pass in the a flag, and any time you write data to the stream, it will be appended to the end. Inside of the .then() block:

Write Captions

The WebVTT and SRT formats are very similar, and each requires a block of text per utterance.

WebVTT

Deepgram provides seconds back as a number (15.4 means 15.4 seconds), but both formats require times as HH:MM:SS.milliseconds and getting the end of a Date().toISOString() will achieve this for us.

Using the SDK

Replace the above code with this single line:

SRT

Differences? No WEBVTT line at the top, millisecond separator is ,, and no - before the utterance.

Using the SDK

Replace the above code with this single line:

One Line to Captions

We actually implemented .toWebVTT() and .toSRT() straight into the Node.js SDK while writing this post. Now, it's easier than ever to create valid caption files automatically with Deepgram. If you have any questions, please feel free to reach out on Twitter - we're @DeepgramAI.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.