Continuing in the series of SDK announcements in the past several weeks (JavaScript and Go), we are excited to release the highly anticipated version 3.0 of our Python SDK. This release is significant as we aim to reduce the barrier of entry further and increase the ease of use on the Deepgram Platform.
The SDK has undergone many changes since the last release, and this blog post hopes to highlight them and be a guide to start using those new features.
Release v3.0 Highlights
This release aligns the SDK architectures between Python, Javascript, and Go. Users will see a lot of structural similarities between projects, especially between Python and Javascript, in terms of instantiating clients and calling API methods.
Due to Python's language flexibility, there are now asynchronous and synchronous classes and methods for accessing the API. The entry point to the SDK is creating either an asynchronous or synchronous client for either Pre-recorded or Live clients for transcription or Project Management. From there, accessing the various APIs can be done through standard methods.
We've enriched our repository with a plethora of Python code examples. Whether a beginner or an experienced developer, these examples provide a practical guide to utilizing every aspect of the SDK, from simple transcriptions to complex use cases, such as streaming from the microphone. We aim to put our best foot forward with a "lead by example" approach.
Pre-recorded Transcription
The ability to transcribe pre-recorded audio is a cornerstone feature of Deepgram. It's an ideal solution for transforming recorded meetings into readable minutes and generating closed captions in formats like VTT and SRT.
Our repository is rich with varied examples, encompassing everything from examples like transcribing a local audio file. In particular, we'll delve into an example in the repo: obtaining a transcript directly from a URL.
Live Streaming Transcription
One prominent enhancement in this latest release is the refined Live/Streaming Client implementation. Our focus has been on simplifying the complexities associated with the underlying websocket interface. In our previous version, users faced the challenge of partially managing the websocket. This approach often proved daunting for newcomers and placed extra maintenance and development burdens on the users.
To better understand this improvement, let's explore a practical code example from what's available in the repo. We'll be examining an example where your local microphone is used to stream audio directly to the Deepgram platform.
Project Management
Finally, let's delve into the API for overseeing your Deepgram projects. The examples we've crafted for the Management APIs are designed to demonstrate the full spectrum of CRUD operations. We aim to encompass an object's creation, retrieval, updating, and deletion with a comprehensive "hello world" example.
To give you a clearer picture, let's examine one specific case – the invitations example, which exemplifies what these "hello world" examples are all about.
The goal of each example will be to exercise CRUD and bring your environment or configuration back to where it was before running the code.
Demo: Tour of the Examples
I'm a visual learner. So, instead of words, you can view a tour of the examples in the YouTube video below.
Example Application: Uploading YouTube Subtitles
We have covered the APIs and where best to find resources using them, but it might be interesting to look at a more complex example.
This example implements a (mostly) automated subtitling utility for YouTube called YouTube Captioner. I created this application because I produce a fair amount of YouTube content. Adding Subtitles (i.e., captions) to your YouTube videos provides:
individuals who are hard of hearing the ability to enjoy content
accurate subtitles, thereby improving indexing your content
Creating these subtitles is time-consuming and typically involves this manual process:
find the mp4 or video that I want to make subtitles for
generate subtitles/captioning by uploading the mp4 to a Speech-to-Text service like Deepgram
navigate to YouTube, find the video, and then upload the Subtitles to the video
There is a fair amount of setup involved in this project. So, the complexity and time are front-loaded, but when configured, this utility will:
download your video from YouTube
convert your video to mp3 (audio only) to reduce the upload time to Deepgram
submit the mp3 to Deepgram to obtain the transcription
convert the transcription to SRT subtitles
upload and publish the SRT subtitles to your video
Let's take a look at this quick demo.
If you are content creator posting videos to YouTube, this is a great way to add subtitles to your videos. Special shoutout to Sandra on the Deepgram team for implementing the Python Captions project which transforms Deepgram transcriptions to VTT and SRT captions.
The Road Ahead
Some immediate enhancements coming real soon are Text-to-Speech via Aura, and some medium to longer-term initiatives are things like OSS project management, GitHub Actions, linting, conversation understanding (an example, summarization), and more.
I encourage those interested in using the Python SDK to give it a try, file any bugs you see, and provide any feedback to help make this project even better. If you build any applications using the SDK, drop us a line in Discord and let us know about it. Happy coding!
Sign up for Deepgram
Sign up for a Deepgram account and get $200 in Free Credit (up to 45,000 minutes), absolutely free. No credit card needed!
Learn more about Deepgram
We encourage you to explore Deepgram by checking out the following resources:
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.