We're excited to announce the release of two open-source captioning packages by Deepgram – deepgram-captions for Python and @deepgram/captions for JavaScript. These packages empower developers to effortlessly generate SRT and WebVTT captions, providing a flexible solution for various speech-to-text APIs, including Deepgram.

Demo

Check out our Streamlit demo of the package being used to create captions for YouTube videos: https://deepgram-captions.streamlit.app/

Key Features

SRT and WebVTT Caption Generation:

  • The packages create both SRT (SubRip Subtitle) and WebVTT (Web Video Text Tracks) captions. SRT captions are widely supported and commonly used for video subtitles, while WebVTT provides a standardized format for captioning web videos. This dual capability ensures compatibility with a broad range of multimedia platforms and applications.

API Agnostic Design:

  • The packages accept JSON responses from a variety of speech-to-text APIs, not just Deepgram. This flexibility allows users to seamlessly integrate the solution with their preferred API. We will continue to add support for other APIs, so if your preferred API is not supported, please reach out to us.

  • Users can also create their own converter - they just need to provide a converter class to provide the formatters with the correct data in the expected format.

Speaker Labeling and Diarization Compatibility:

  • When leveraging the diarization feature of APIs like Deepgram, these packages match captions with specific speakers. This functionality enhances the overall accessibility and user experience of generated captions.

No Deepgram Dependency:

  • While designed to complement Deepgram's capabilities, the packages don't rely on Deepgram as a mandatory dependency. Users can utilize these tools with the API of their choice, broadening the scope of application.

Whisper Compatibility

  • Due to OpenAI’s API not including timestamps with its speech-to-text output, Whisper cannot be used directly. But we have included two ways to use Whisper with the Python package - you can create captions from the popular package whisper-timestamped or use Deepgram’s hosted Whisper with the Deepgram converter.

User-Friendly Caption Creation:

  • Aimed at simplifying the caption creation process, these open-source packages cater to users seeking an easy and efficient way to generate captions for their audio and video content.

A Closer Look

Installation

Python:

pip install deepgram-captions

Javascript:

npm install @deepgram/captions

How it works

The package takes a JSON object response from a transcription request, supporting various APIs. This is the shape of a Deepgram JSON response:

{
  "metadata": {
    ...
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": ...,
            "confidence": ...,
            "words": [...]
          }
        ]
      }
    ]
  }
}

A converter class turns the JSON object into the needed shape for the functions that create the captions. The DeepgramConverter can handle a Deepgram response, while other converters can handle responses that come from other transcription APIs.

Users can create their own class to handle a JSON object, converting it into the expected format so that the captions functions can create SRT or WebVTT captions:

transcriptionData = [
  [
    {
      word: string;
      start: number;
      end: number;
      punctuated_word: string; // optional
    }
  ]
]

Example usage WebVTT

WebVTT from Deepgram Transcriptions

from deepgram_captions import DeepgramConverter, webvtt
transcription = DeepgramConverter(dg_response)
captions = webvtt(transcription)
import { webvtt } from "@deepgram/captions";
const result = webvtt(transcription);

Output WebVTT

WEBVTT

NOTE
Transcription provided by Deepgram
Request Id: 686278aa-d315-4aeb-b2a9-713615544366
Created: 2023-10-27T15:35:56.637Z
Duration: 25.933313
Channels: 1

00:00:00.080 --> 00:00:03.220
Yeah. As as much as, it's worth celebrating,

00:00:04.400 --> 00:00:05.779
the first, spacewalk,

00:00:06.319 --> 00:00:07.859
with an all female team,

00:00:08.475 --> 00:00:10.715
I think many of us are looking forward

With speakers:

00:00:08.625 --> 00:00:10.465
<v Speaker 0>You're gonna do great today. We'll be waiting

00:00:10.465 --> 00:00:11.825
<v Speaker 0>for you here in a couple hours when

00:00:11.825 --> 00:00:13.585
<v Speaker 0>you get home. I'm gonna hand you over

00:00:13.585 --> 00:00:14.725
<v Speaker 0>to Stephanie now.

00:00:15.740 --> 00:00:17.280
<v Speaker 0>Have a great, great EVA.

00:00:17.660 --> 00:00:19.580
<v Speaker 1>Drew, thank you so much. It's been our

00:00:19.580 --> 00:00:21.420
<v Speaker 1>pleasure working with you this morning. And I'm

00:00:21.420 --> 00:00:23.360
<v Speaker 1>working on getting that EV hatch open,

Example Usage SRT

SRT from Deepgram Transcriptions

from deepgram_captions import DeepgramConverter, srt
transcription = DeepgramConverter(dg_response)
captions = srt(transcription)
import { srt } from "@deepgram/captions";
const result = srt(transcription);

Output SRT

1
00:00:00,080 --> 00:00:03,220
Yeah. As as much as, it's worth celebrating,

2
00:00:04,400 --> 00:00:07,859
the first, spacewalk, with an all female team,

3
00:00:08,475 --> 00:00:10,715
I think many of us are looking forward

With speakers:

1
00:00:03,040 --> 00:00:03,540
[speaker 0]
And,

2
00:00:04,640 --> 00:00:07,585
Jessica, Christina, we are so proud of you.

3
00:00:08,625 --> 00:00:10,465
You're gonna do great today. We'll be waiting

4
00:00:10,465 --> 00:00:11,825
for you here in a couple hours when

5
00:00:11,825 --> 00:00:13,585
you get home. I'm gonna hand you over

6
00:00:13,585 --> 00:00:14,725
to Stephanie now.

7
00:00:15,740 --> 00:00:17,280
Have a great, great EVA.

8
00:00:17,660 --> 00:00:19,580
[speaker 1]
Drew, thank you so much. It's been our

9
00:00:19,580 --> 00:00:21,420
pleasure working with you this morning. And I'm

10
00:00:21,420 --> 00:00:23,360
working on getting that EV hatch open,

Conclusion

With the release of these open-source captioning packages, Deepgram continues to empower developers, providing a versatile and accessible solution for caption generation. Whether you're using Deepgram or another speech-to-text API, these packages offer a developer-friendly experience, making caption creation a breeze.

Sign up for Deepgram

Sign up for a Deepgram account and get $200 in Free Credit (up to 45,000 minutes), absolutely free. No credit card needed!

Learn more about Deepgram

We encourage you to explore Deepgram by checking out the following resources:

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo