Article·AI & Engineering·Nov 17, 2023

Subtitles Made Easy: Deepgram's New Open Source Captioning Packages

Sandra Rodgers
By Sandra Rodgers
PublishedNov 17, 2023
UpdatedJun 13, 2024

We're excited to announce the release of two open-source captioning packages by Deepgram – deepgram-captions for Python and @deepgram/captions for JavaScript. These packages empower developers to effortlessly generate SRT and WebVTT captions, providing a flexible solution for various speech-to-text APIs, including Deepgram.

Demo

Check out our Streamlit demo of the package being used to create captions for YouTube videos: https://deepgram-captions.streamlit.app/

Key Features

SRT and WebVTT Caption Generation:

  • The packages create both SRT (SubRip Subtitle) and WebVTT (Web Video Text Tracks) captions. SRT captions are widely supported and commonly used for video subtitles, while WebVTT provides a standardized format for captioning web videos. This dual capability ensures compatibility with a broad range of multimedia platforms and applications.

API Agnostic Design:

  • The packages accept JSON responses from a variety of speech-to-text APIs, not just Deepgram. This flexibility allows users to seamlessly integrate the solution with their preferred API. We will continue to add support for other APIs, so if your preferred API is not supported, please reach out to us.

  • Users can also create their own converter - they just need to provide a converter class to provide the formatters with the correct data in the expected format.

Speaker Labeling and Diarization Compatibility:

  • When leveraging the diarization feature of APIs like Deepgram, these packages match captions with specific speakers. This functionality enhances the overall accessibility and user experience of generated captions.

No Deepgram Dependency:

  • While designed to complement Deepgram's capabilities, the packages don't rely on Deepgram as a mandatory dependency. Users can utilize these tools with the API of their choice, broadening the scope of application.

Whisper Compatibility

  • Due to OpenAI’s API not including timestamps with its speech-to-text output, Whisper cannot be used directly. But we have included two ways to use Whisper with the Python package - you can create captions from the popular package whisper-timestamped or use Deepgram’s hosted Whisper with the Deepgram converter.

User-Friendly Caption Creation:

  • Aimed at simplifying the caption creation process, these open-source packages cater to users seeking an easy and efficient way to generate captions for their audio and video content.

A Closer Look

Installation

Python:

Javascript:

How it works

The package takes a JSON object response from a transcription request, supporting various APIs. This is the shape of a Deepgram JSON response:

A converter class turns the JSON object into the needed shape for the functions that create the captions. The DeepgramConverter can handle a Deepgram response, while other converters can handle responses that come from other transcription APIs.

Users can create their own class to handle a JSON object, converting it into the expected format so that the captions functions can create SRT or WebVTT captions:

Example usage WebVTT

WebVTT from Deepgram Transcriptions

Output WebVTT

With speakers:

Example Usage SRT

SRT from Deepgram Transcriptions

Output SRT

With speakers:

Conclusion

With the release of these open-source captioning packages, Deepgram continues to empower developers, providing a versatile and accessible solution for caption generation. Whether you're using Deepgram or another speech-to-text API, these packages offer a developer-friendly experience, making caption creation a breeze.

Sign up for Deepgram

Sign up for a Deepgram account and get $200 in Free Credit (up to 45,000 minutes), absolutely free. No credit card needed!

Learn more about Deepgram

We encourage you to explore Deepgram by checking out the following resources:

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.