Continuing in the series of SDK announcements in the past several weeks (JavaScript and Go), we are excited to release the highly anticipated version 3.0 of our Python SDK. This release is significant as we aim to reduce the barrier of entry further and increase the ease of use on the Deepgram Platform.

Deepgram Python v3.0 Release

Deepgram Python v3.0 Release


The SDK has undergone many changes since the last release, and this blog post hopes to highlight them and be a guide to start using those new features.

Release v3.0 Highlights

This release aligns the SDK architectures between Python, Javascript, and Go. Users will see a lot of structural similarities between projects, especially between Python and Javascript, in terms of instantiating clients and calling API methods.


Due to Python's language flexibility, there are now asynchronous and synchronous classes and methods for accessing the API. The entry point to the SDK is creating either an asynchronous or synchronous client for either Pre-recorded or Live clients for transcription or Project Management. From there, accessing the various APIs can be done through standard methods.

Python Threads Matrix Style

Python Threads Matrix Style

We've enriched our repository with a plethora of Python code examples. Whether a beginner or an experienced developer, these examples provide a practical guide to utilizing every aspect of the SDK, from simple transcriptions to complex use cases, such as streaming from the microphone. We aim to put our best foot forward with a "lead by example" approach.

Pre-recorded Transcription

The ability to transcribe pre-recorded audio is a cornerstone feature of Deepgram. It's an ideal solution for transforming recorded meetings into readable minutes and generating closed captions in formats like VTT and SRT.


Our repository is rich with varied examples, encompassing everything from examples like transcribing a local audio file. In particular, we'll delve into an example in the repo: obtaining a transcript directly from a URL.

# STEP 1: Create a Deepgram client using the API key from environment variables
deepgram = DeepgramClient()

# STEP 2: Call the transcribe_url method on the prerecorded class
options = PrerecordedOptions(
  model="nova",
  smart_format=True,
  summarize="v2",
)

url_response = deepgram.listen.prerecorded.v("1").transcribe_url(AUDIO_URL, options)
print(url_response)

Live Streaming Transcription

One prominent enhancement in this latest release is the refined Live/Streaming Client implementation. Our focus has been on simplifying the complexities associated with the underlying websocket interface. In our previous version, users faced the challenge of partially managing the websocket. This approach often proved daunting for newcomers and placed extra maintenance and development burdens on the users.


To better understand this improvement, let's explore a practical code example from what's available in the repo. We'll be examining an example where your local microphone is used to stream audio directly to the Deepgram platform.

# STEP 1: Create a Deepgram client using the API key from environment variables
deepgram = DeepgramClient()
dg_connection = deepgram.listen.live.v("1")

# STEP 2: Implement the callback interfaces for various Deepgram events/messages
def on_message(self, result, addon=dict(myattr=True), **kwargs):
    sentence = result.channel.alternatives[0].transcript
    if len(sentence) == 0:
        return
    print(f"speaker: {sentence}")

def on_metadata(self, metadata, **kwargs):
    print(f"\n\n{metadata}\n\n")

def on_error(self, error, **kwargs):
    print(f"\n\n{error}\n\n")

# STEP 3: Hook the event notifications to the callbacks
dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
dg_connection.on(LiveTranscriptionEvents.Metadata, on_metadata)
dg_connection.on(LiveTranscriptionEvents.Error, on_error)

# STEP 4: start the websocket/live client using the selected transcription options
options = LiveOptions(
    punctuate=True,
    language="en-US",
    encoding="linear16",
    channels=1,
    sample_rate=16000,
)
dg_connection.start(options)

# STEP 5: create a Microphone using your local mic and pass in the send() function on your Deepgram client
microphone = Microphone(dg_connection.send)

# start microphone
microphone.start()

# wait until finished
input("Press Enter to stop recording...\n\n")

Project Management

Finally, let's delve into the API for overseeing your Deepgram projects. The examples we've crafted for the Management APIs are designed to demonstrate the full spectrum of CRUD operations. We aim to encompass an object's creation, retrieval, updating, and deletion with a comprehensive "hello world" example.


To give you a clearer picture, let's examine one specific case – the invitations example, which exemplifies what these "hello world" examples are all about.

# Create a Deepgram client using the API key
deepgram = DeepgramClient()

# get projects
projectResp = deepgram.manage.v("1").get_projects()
if projectResp is None:
  print(f"ListProjects failed.")

myId = None
myName = None
for project in projectResp.projects:
  myId = project.project_id
  myName = project.name
  print(f"ListProjects() - ID: {myId}, Name: {myName}")
  break

# list invites for a given project
listResp = deepgram.manage.v("1").get_invites(myId)
if len(listResp.invites) == 0:
  print("No invites found")
else:
  for invite in listResp.invites:
    print(f"GetInvites() - Name: {invite.email}, Amount: {invite.scope}")

# send an invite
options = InviteOptions(email="spam@spam.com", scope="member")

getResp = deepgram.manage.v("1").send_invite_options(myId, options)
print(f"SendInvite() - Msg: {getResp.message}")

# list all the invites
listResp = deepgram.manage.v("1").get_invites(myId)
if listResp is None:
  print("No invites found")
else:
  for invite in listResp.invites:
    print(f"GetInvites() - Name: {invite.email}, Amount: {invite.scope}")

# delete invite we created
delResp = deepgram.manage.v("1").delete_invite(myId, "spam@spam.com")
print(f"DeleteInvite() - Msg: {delResp.message}")

The goal of each example will be to exercise CRUD and bring your environment or configuration back to where it was before running the code.

Demo: Tour of the Examples

I'm a visual learner. So, instead of words, you can view a tour of the examples in the YouTube video below.

Example Application: Uploading YouTube Subtitles

We have covered the APIs and where best to find resources using them, but it might be interesting to look at a more complex example.

This example implements a (mostly) automated subtitling utility for YouTube called YouTube Captioner. I created this application because I produce a fair amount of YouTube content. Adding Subtitles (i.e., captions) to your YouTube videos provides:

  • individuals who are hard of hearing the ability to enjoy content

  • accurate subtitles, thereby improving indexing your content

Creating these subtitles is time-consuming and typically involves this manual process:

  • find the mp4 or video that I want to make subtitles for

  • generate subtitles/captioning by uploading the mp4 to a Speech-to-Text service like Deepgram

  • navigate to YouTube, find the video, and then upload the Subtitles to the video

There is a fair amount of setup involved in this project. So, the complexity and time are front-loaded, but when configured, this utility will:

  • download your video from YouTube

  • convert your video to mp3 (audio only) to reduce the upload time to Deepgram

  • submit the mp3 to Deepgram to obtain the transcription

  • convert the transcription to SRT subtitles

  • upload and publish the SRT subtitles to your video

Let's take a look at this quick demo.

If you are content creator posting videos to YouTube, this is a great way to add subtitles to your videos. Special shoutout to Sandra on the Deepgram team for implementing the Python Captions project which transforms Deepgram transcriptions to VTT and SRT captions.

The Road Ahead

Some immediate enhancements coming real soon are Text-to-Speech via Aura, and some medium to longer-term initiatives are things like OSS project management, GitHub Actions, linting, conversation understanding (an example, summarization), and more.

I encourage those interested in using the Python SDK to give it a try, file any bugs you see, and provide any feedback to help make this project even better. If you build any applications using the SDK, drop us a line in Discord and let us know about it. Happy coding!


Sign up for Deepgram

Sign up for a Deepgram account and get $200 in Free Credit (up to 45,000 minutes), absolutely free. No credit card needed!

Learn more about Deepgram

We encourage you to explore Deepgram by checking out the following resources:

  1. Deepgram API Playground 

  2. Deepgram Documentation

  3. Deepgram Starter Apps


Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo
Deepgram
Essential Building Blocks for Voice AI