Article·AI & Engineering·May 22, 2024

Enhance Your Voice AI Windows Applications with the Deepgram .NET 4.0 SDK

David vonThenen
By David vonThenen
PublishedMay 22, 2024
UpdatedJun 13, 2024

It's been a couple of weeks since we announced the highly anticipated release of the Deepgram .NET SDK v4.0, but we decided to call attention to renewed efforts in the Windows ecosystem. This update brings many new features and improvements designed to make it easier than ever to integrate Voice AI capabilities into your Windows applications.

Deepgram .NET SDK

Deepgram .NET SDK

The .NET SDK has undergone extensive changes since the last major release, and this blog post will highlight these updates and serve as a guide to start using the new features.

Release v4.0 Highlights

This release significantly restructures the .NET SDK, aligning it more closely with the architectures of our other SDKs. Users will notice structural similarities between projects, especially with JavaScript, Python, and Go, regarding instantiating clients and calling API methods.

The notable enhancements from the previous release include:

  • Improved Implementation for Live, PreRecorded, and Manage Clients

  • Provides Text-to-Speech Capabilities, giving your applications a choice of numerous human-like voices

  • Implements Intelligence APIs (Summary, Intent, Topic, Sentiment) for PreRecorded and Text analysis

  • Introduces On-Prem Support for the first time

  • Published a Microphone package for local live-streaming implementations

  • Improved and Independent Timeout Capabilities per API Call

  • Verbosity Logging Levels for Troubleshooting

  • Custom Header and Query Parameters for API calls

We've updated the repository with several .NET code examples covering all of the SDK's APIs. Whether a beginner or an experienced developer, these examples provide a practical guide to utilizing every aspect of the SDK, from simple transcriptions to complex use cases, such as streaming from the microphone. We aim to put our best foot forward with a "lead by example" approach.

Transcribing PreRecorded Audio or Video

The ability to transcribe pre-recorded audio remains a cornerstone feature of Deepgram. It's an ideal solution for transforming recorded meetings into readable minutes and generating intelligence insights through summarization, intents, topics, and sentiment.

Our repository is packed with varied examples, showcasing everything from transcribing a local audio file to obtaining a transcript directly from a URL. Here's a quick look at a sample code snippet for transcribing audio from a URL:

// STEP 1: Create a Deepgram client using the API key from environment variables
var deepgram = ClientFactory.CreatePreRecordedClient();

// STEP 2: Call the TranscribeUrl method on the PreRecorded class
var response = await deepgramClient.TranscribeUrl(
  new UrlSource("https://dpgr.am/bueller.wav"),
  new PreRecordedSchema()
  {
    Model = "nova-2",
  });

// STEP 3: Manipulate the response as needed
Console.WriteLine(response);

Live Streaming Speech-to-Text Transcription

A prominent enhancement in this latest release is the refined Live/Streaming Client implementation. Our focus has been on simplifying the complexities associated with the underlying WebSocket interface. In our previous version, users had to partially manage the WebSocket themselves, which often proved daunting for newcomers and placed extra maintenance and development burdens on the users.

To better understand this improvement, let's explore a practical code example from our repository where we take an HTTP webcast to feed directly to the Deepgram platform:

// STEP 1: Create a Deepgram client using the API key from environment variables
var liveClient = new LiveClient();

// STEP 2: Subscribe to the EventResponseReceived event
liveClient.Subscribe(new EventHandler<ResultResponse>((sender, e) =>
{
  if (e.Channel.Alternatives[0].Transcript == "")
  {
    return;
  }
  Console.WriteLine($"Speaker: {e.Channel.Alternatives[0].Transcript}");
}));

// STEP 3: Set the connection options and connect
var liveSchema = new LiveSchema()
{
    Model = "nova-2",
    Punctuate = true,
    SmartFormat = true,
};
await liveClient.Connect(liveSchema);

// STEP 4: Get the webcast stream and feed into the client
// get the webcast data... this is a blocking operation
try
{
  var url = "http://stream.live.vc.bbcmedia.co.uk/bbc_world_service";
  
  using (HttpClient client = new HttpClient())
  {
    using (Stream receiveStream = await client.GetStreamAsync(url))
    {
      while (liveClient.State() == WebSocketState.Open)
      {
        byte[] buffer = new byte[2048];
        await receiveStream.ReadAsync(buffer, 0, buffer.Length);
        liveClient.Send(buffer);
      }
    }
  }
}

catch (Exception e)
{
    Console.WriteLine(e.Message);
}

Giving Your Applications a Human Voice

The most anticipated feature in the SDK is our last major release to the Deepgram platform: Text-to-Speech capabilities.

Here's a look at a specific case—the invitations example, which showcases what these "hello world" examples are all about:

// STEP 1: Create a Deepgram client using the API key from environment variables
var deepgramClient = ClientFactory.CreateSpeakClient();

// STEP 2: Generate an MP3 file based on the provided text
var response = await deepgramClient.ToFile(
  new TextSource("Hello World!"),
  "test.mp3",
  new SpeakSchema()
  {
    Model = "aura-asteria-en",
  });

// STEP 3: Access obtain the properties of the MP3 file in the response object
Console.WriteLine(response);

There is not much code to get the best human-like voice in an audio file. Just to give you a sense of the quality of the voice, take a listen to the voice selection in the documentation.

Amazing! There is work underway to include different languages and variations on accents, tone, etc. Stay tuned for that!

Introducing the Deepgram Microphone Package

A standout addition in this release is introducing the Deepgram Microphone package, simplifying real-time audio streaming from a microphone to the Deepgram platform. This package is available on NuGet and is designed to help developers integrate live audio streaming into their applications.

To demonstrate this package's power and ease of use, let's look at an example of using the microphone in a real-time transcription application. This example can be found in our GitHub repository.

Here's a snippet to get you started:

// STEP 1: Create a Deepgram client using the API key from environment variables
DeepgramWsClientOptions options = new DeepgramWsClientOptions(null, null, true);
var liveClient = new LiveClient("", options);

// STEP 2: Subscribe to the EventResponseReceived event
liveClient.Subscribe(new EventHandler<OpenResponse>((sender, e) =>
{
  Console.WriteLine($"\n\n----> {e.Type} received");
}));

liveClient.Subscribe(new EventHandler<ResultResponse>((sender, e) =>
{
  if (e.Channel.Alternatives[0].Transcript.Trim() == "")
  {
    return;
  }

  // Console.WriteLine("Transcription received: " + JsonSerializer.Serialize(e.Transcription));
  Console.WriteLine($"----> Speaker: {e.Channel.Alternatives[0].Transcript}");
}));

liveClient.Subscribe(new EventHandler<CloseResponse>((sender, e) =>
{
  Console.WriteLine($"----> {e.Type} received");
}));

// STEP 3: Set the connection options and connect
var liveSchema = new LiveSchema()
{
  Model = "nova-2",
  Encoding = "linear16",
  SampleRate = 16000,
  Punctuate = true,
  SmartFormat = true,
  InterimResults = true,
  UtteranceEnd = "1000",
  VadEvents = true,
};

await liveClient.Connect(liveSchema);

// STEP 4: Start the Microphone and stream audio data
var microphone = new Microphone(liveClient.Send);
microphone.Start();

// Wait for the user to press a key
Console.ReadKey();

// Stop the microphone
microphone.Stop();

// Stop the connection
await liveClient.Stop();

It’s best to convey the ease of use via a short YouTube clip below.

This example demonstrates setting up a real-time transcription application using the Deepgram Microphone package. This package makes it easy to stream live audio from your local microphone to the Deepgram platform for transcription. This has been a much-anticipated example as it’s a great starting point for testing out the SDK and the Live Transcription capabilities as well as for doing local development.

The Road Ahead

This release was massive, and there is more to come. We are currently looking at initiatives to ease the SDK's maintainability. This includes automation using GitHub Actions, linting and static checks of all code in the repo, and so much more. The massive update for this SDK is also our desire to provide a better user experience for the Windows platform. So, we expect to see many more Windows-platform and project integrations in the future.

I encourage those interested in using the .NET SDK to try it, report any bugs they encounter, and provide feedback to help make this project even better. If you build any applications using the .NET SDK, drop us a message in Discord or post a question in GitHub Discussions and let us know about what you are building. 

Happy coding!

Sign up for Deepgram

Sign up for a Deepgram account and get $200 in Free Credit (up to 45,000 minutes), absolutely free. No credit card needed!

Learn more about Deepgram

We encourage you to explore Deepgram by checking out the following resources:

  1. Deepgram API Playground 

  2. Deepgram Documentation

  3. Deepgram Starter Apps

  4. Deepgram Discord Server

  5. Deepgram GitHub Discussions

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.