Subtitles Made Easy: Deepgram's New Open Source Captioning Packages

Sandra Rodgers

We're excited to announce the release of two open-source captioning packages by Deepgram – deepgram-captions for Python and @deepgram/captions for JavaScript. These packages empower developers to effortlessly generate SRT and WebVTT captions, providing a flexible solution for various speech-to-text APIs, including Deepgram.
Demo
Check out our Streamlit demo of the package being used to create captions for YouTube videos: https://deepgram-captions.streamlit.app/
Key Features
SRT and WebVTT Caption Generation:
The packages create both SRT (SubRip Subtitle) and WebVTT (Web Video Text Tracks) captions. SRT captions are widely supported and commonly used for video subtitles, while WebVTT provides a standardized format for captioning web videos. This dual capability ensures compatibility with a broad range of multimedia platforms and applications.
API Agnostic Design:
The packages accept JSON responses from a variety of speech-to-text APIs, not just Deepgram. This flexibility allows users to seamlessly integrate the solution with their preferred API. We will continue to add support for other APIs, so if your preferred API is not supported, please reach out to us.
Users can also create their own converter - they just need to provide a converter class to provide the formatters with the correct data in the expected format.
Speaker Labeling and Diarization Compatibility:
When leveraging the diarization feature of APIs like Deepgram, these packages match captions with specific speakers. This functionality enhances the overall accessibility and user experience of generated captions.
No Deepgram Dependency:
While designed to complement Deepgram's capabilities, the packages don't rely on Deepgram as a mandatory dependency. Users can utilize these tools with the API of their choice, broadening the scope of application.
Whisper Compatibility
Due to OpenAI’s API not including timestamps with its speech-to-text output, Whisper cannot be used directly. But we have included two ways to use Whisper with the Python package - you can create captions from the popular package whisper-timestamped or use Deepgram’s hosted Whisper with the Deepgram converter.
User-Friendly Caption Creation:
Aimed at simplifying the caption creation process, these open-source packages cater to users seeking an easy and efficient way to generate captions for their audio and video content.
A Closer Look
Installation
Python:
pip install deepgram-captions
Javascript:
npm install @deepgram/captions
How it works
The package takes a JSON object response from a transcription request, supporting various APIs. This is the shape of a Deepgram JSON response:
{
"metadata": {
...
},
"results": {
"channels": [
{
"alternatives": [
{
"transcript": ...,
"confidence": ...,
"words": [...]
}
]
}
]
}
}
A converter class turns the JSON object into the needed shape for the functions that create the captions. The DeepgramConverter can handle a Deepgram response, while other converters can handle responses that come from other transcription APIs.
Users can create their own class to handle a JSON object, converting it into the expected format so that the captions functions can create SRT or WebVTT captions:
transcriptionData = [
[
{
word: string;
start: number;
end: number;
punctuated_word: string; // optional
}
]
]
Example usage WebVTT
WebVTT from Deepgram Transcriptions
from deepgram_captions import DeepgramConverter, webvtt
transcription = DeepgramConverter(dg_response)
captions = webvtt(transcription)
import { webvtt } from "@deepgram/captions";
const result = webvtt(transcription);
Output WebVTT
WEBVTT
NOTE
Transcription provided by Deepgram
Request Id: 686278aa-d315-4aeb-b2a9-713615544366
Created: 2023-10-27T15:35:56.637Z
Duration: 25.933313
Channels: 1
00:00:00.080 --> 00:00:03.220
Yeah. As as much as, it's worth celebrating,
00:00:04.400 --> 00:00:05.779
the first, spacewalk,
00:00:06.319 --> 00:00:07.859
with an all female team,
00:00:08.475 --> 00:00:10.715
I think many of us are looking forward
With speakers:
00:00:08.625 --> 00:00:10.465
<v Speaker 0>You're gonna do great today. We'll be waiting
00:00:10.465 --> 00:00:11.825
<v Speaker 0>for you here in a couple hours when
00:00:11.825 --> 00:00:13.585
<v Speaker 0>you get home. I'm gonna hand you over
00:00:13.585 --> 00:00:14.725
<v Speaker 0>to Stephanie now.
00:00:15.740 --> 00:00:17.280
<v Speaker 0>Have a great, great EVA.
00:00:17.660 --> 00:00:19.580
<v Speaker 1>Drew, thank you so much. It's been our
00:00:19.580 --> 00:00:21.420
<v Speaker 1>pleasure working with you this morning. And I'm
00:00:21.420 --> 00:00:23.360
<v Speaker 1>working on getting that EV hatch open,
Example Usage SRT
SRT from Deepgram Transcriptions
from deepgram_captions import DeepgramConverter, srt
transcription = DeepgramConverter(dg_response)
captions = srt(transcription)
import { srt } from "@deepgram/captions";
const result = srt(transcription);
Output SRT
1
00:00:00,080 --> 00:00:03,220
Yeah. As as much as, it's worth celebrating,
2
00:00:04,400 --> 00:00:07,859
the first, spacewalk, with an all female team,
3
00:00:08,475 --> 00:00:10,715
I think many of us are looking forward
With speakers:
1
00:00:03,040 --> 00:00:03,540
[speaker 0]
And,
2
00:00:04,640 --> 00:00:07,585
Jessica, Christina, we are so proud of you.
3
00:00:08,625 --> 00:00:10,465
You're gonna do great today. We'll be waiting
4
00:00:10,465 --> 00:00:11,825
for you here in a couple hours when
5
00:00:11,825 --> 00:00:13,585
you get home. I'm gonna hand you over
6
00:00:13,585 --> 00:00:14,725
to Stephanie now.
7
00:00:15,740 --> 00:00:17,280
Have a great, great EVA.
8
00:00:17,660 --> 00:00:19,580
[speaker 1]
Drew, thank you so much. It's been our
9
00:00:19,580 --> 00:00:21,420
pleasure working with you this morning. And I'm
10
00:00:21,420 --> 00:00:23,360
working on getting that EV hatch open,
Conclusion
With the release of these open-source captioning packages, Deepgram continues to empower developers, providing a versatile and accessible solution for caption generation. Whether you're using Deepgram or another speech-to-text API, these packages offer a developer-friendly experience, making caption creation a breeze.
Sign up for Deepgram
Sign up for a Deepgram account and get $200 in Free Credit (up to 45,000 minutes), absolutely free. No credit card needed!
Learn more about Deepgram
We encourage you to explore Deepgram by checking out the following resources: