API

Quickstart Guide for the Deepgram MissionControl API

BY Natalie Rutgers  | June 18 2020

In this quickstart, you’ll:

  1. Create a Deepgram account
  2. Get an automatic transcript from the Deepgram General Model
  3. Train an Intermediate Model with one of the free training-ready datasets
  4. Review your Intermediate Model’s Improvement
  5. Deploy your Intermediate Model and use it to get an automatic transcript

Let’s get started!

Create a Deepgram account

To get started, create an account at missioncontrol.deepgram.com

Your account comes preloaded with a few freebies:

1. 20 audio hours per month of Automatic Speech Recognition
2. The ability to train 2 Intermediate Models
3. The ability to deploy 1 of your Intermediate Models
4. 10 minutes of professional data labeling to help create training data
5. 2 Free Training-ready datasets
6. Access to 3 of Deepgram’s Beginner models

Get an automatic transcript from the Deepgram General Model

Now that you have your account, let’s make something happen. Deepgram’s API allows you to process both local files and remote files that are publicly accessible. Depending on your audio, select the corresponding curl command below, being sure to swap in your password and the email you used to create your MissionControl account for the username.

If you don’t have an audio file of your own, try out the audio file supplied in the hosted example. We highly recommend running these requests through jq for easy-to-read outputs.

Transcribing a Local File

To test with a file on your computer, run this in a terminal or your favorite API client:

curl \
  -X POST \
  -u USERNAME:PASSWORD\
  -H "Content-Type: audio/wav" \
  --data-binary @myaudio.wav \
  "https://brain.deepgram.com/v2/listen"

Transcribing a Remote File

To test with a remote file that is publicly accessible (e.g. hosted in AWS S3 or another server), run this in a terminal or your favorite API client:

curl \
  -X POST \
  -u USERNAME:PASSWORD\
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.static.deepgram.com/examples/interview_speech-analytics.wav"}' \
  "https://brain.deepgram.com/v2/listen"

Wait a little bit (often only a few seconds) and you’ll get a JSON response like this:

{
  "metadata": {
    "transaction_key": "iqIt",
    "request_id": "m9IBwpEGuLOd5fDVUcwRoojQeVDIc4wU",
    "sha256": "6b198da276e1108a87e15674ba5e68f4893f85aa584ea96c2b0b5fe32e756bd9",
    "created": "2020-05-01T18:19:17.153Z",
    "duration": 2705.3577,
    "channels": 1
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "hey natalie just joined",
            "confidence": 0.87023026,
            "words": [
              {
                "word": "hey",
                "start": 35.61904,
                "end": 35.77853,
                "confidence": 0.54808563
              },
              {
                "word": "natalie",
                "start": 35.77853,
                "end": 36.27853,
                "confidence": 0.41259128
              },
              ...
              }
            ]
          }
        ]
      }
    ]
  }
}

If you’d like to get a transcript with punctation, add the ?punctuate=true parameter. Check out the /listen docs to get a complete overview of what’s possible with Deepgram’s transcription API.

Train an Intermediate Model with one of the free training-ready datasets

Now, let’s get to the good stuff. Let’s improve automatic transcript accuracy by training a model.

If you’re ready to train a model on your own speech data, jump ahead to this tutorial. Otherwise let’s make use of some free datasets.

To get a list of your datasets, run:

curl -X GET -u USERNAME:PASSWORD https://missioncontrol.deepgram.com/v1/datasets

For this example, we’ll use:

{
      "id": "241cb42b-bfdf-4ae9-8d18-47e3183e8ed0",
      "name": "Scott Stephenson On Artificiality",
      "created": "2020-04-22T18:50:35.717852Z",
      "resource_count": 10,
      "total_duration": 2368.0440000000003,
      "status": "LABELED",
      "read_only": true
    },

You’ll notice that this dataset is about 40 minutes and that it has 10 resources (audio files with labels). Luckily, it’s already LABELED so it’s ready for training.

A note on training data : Most speech recognition solutions come pre-trained and don’t allow end users to optimize it any further. Here, you’re going to build your speech model from scratch. How? The data that you select for training is going to act like a series of learning templates.

If your audio contains a bunch of examples of people saying a product name or a variety of voices with a particular accent, the model will listen to the audio, look for the corresponding word, and learn to pattern match. This allows it to learn to pay attention to unfamiliar nuances and optimize itself to do so going forward.

Let’s take this dataset and use it to train a new speech model. To do this, let’s submit a curl command that names our model and associates the dataset we selected to it.

curl -X POST -u USERNAME:PASSWORD https://missioncontrol.deepgram.com/v1/models?dataset-id=241cb42b-bfdf-4ae9-8d18-47e3183e8ed0 -H 'content-type: application/json' -d '{"name": "test-model"}'

You’ll quickly get back a response that shows your new model.

{
  "model_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "version_id": "12345678-1234-1234-1234-1234567890ab",
  "model_name": "test-model",
  "created": "2020-05-01T18:56:40.316185Z",
  "model_type": "USER",
  "wer": null,
  "trained_at": null,
  "status": "CANCELLED"
}

Go ahead and copy the model_id. We’ll use that to submit the model for training.

Perfect, plug that model_id in and run the following command:

curl -X POST -u USERNAME:PASSWORD "https://missioncontrol.deepgram.com/v1/train?model-id={model-id}&base-model-id=e1eea600-6c6b-400a-a707-a491509e52f1"

You’ll see a response confirming that your model has been submitted and its status has changed to PENDING

{
 "id":"a21e82a7-5bac-4b2a-a816-cb2f84e08ca8",
 "model_id":"aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
 "submitted":"2020-05-01T19:12:24.913587Z",
 "finished": null,
 "status":"PENDING"
}

Training will take some time, but you’ll be emailed once your model has finished.

Review your Intermediate Model’s Improvement

Congratulations! You’ve trained your first custom model.

Let’s see how it now stacks up against the Deepgram General model by querying /stats:

curl -X GET -u USERNAME:PASSWORD https://missioncontrol.deepgram.com/v1/models/{model-id}/stats

You’ll get a response that looks like this:

{
  "model": {
    "model_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "version_id": "12345678-1234-1234-1234-1234567890ab",
    "model_name": "test-model",
    "created": "2020-04-24T16:30:13.351928Z",
    "model_type": "USER",
    "wer": 0.06614193891609181,
    "trained_at": "2020-04-23T14:48:39.302791Z",
    "status": "TRAINED"
  },
  "wers": [
    {
      "model": {
        "model_id": "e1eea600-6c6b-400a-a707-a491509e52f1",
        "version_id": "db92f481-8072-4081-95df-41cf006fdbc1",
        "model_name": "general",
        "created": "2020-04-22T15:53:48.472754Z",
        "model_type": "DEEPGRAM_BASE",
        "wer": null,
        "trained_at": "2020-04-23T04:48:58.765539Z",
        "status": "DEPLOYED"
      },
      "wer_comparison": 0.07171936177294978
    },
    {
      "model": {
        "model_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
        "version_id": "12345678-1234-1234-1234-1234567890ab",
        "model_name": "test-model",
        "created": "2020-04-24T16:30:13.351928Z",
        "model_type": "USER",
        "wer": 0.06614193891609181,
        "trained_at": "2020-04-23T14:48:39.302791Z",
        "status": "TRAINED"
      },
      "wer_comparison": 0.06614193891609181
    }
  ]
}

The key piece of information delivered here are the word error rates, or WER. That metric gives you a first impression of your model’s general performance.

How did we calculate these word error rates? For machine learning efforts, it’s typical to split a body of data into two datasets — a train set and a validation set. Deepgram automatically does that with the data that you use to train a model, and relies on these datasets to compare your model against the Deepgram General model.

Now, you can see here that the Deepgram General model had a word error rate of 7.17%. That’s pretty darn good considering most off-the-shelf speech recognition models have WERs of around 15-35%.

That said, with just a bit of training, you were able to improve your model even further — 6.61%. Not too shabby!

OFF-THE-SHELF ASR

15-35% WER

DEEPGRAM GENERAL MODEL

7.17% WER

YOUR INTERMEDIATE MODEL

6.61% WER

Deploy your Intermediate Model and use it to get an automatic transcript

The true test for speech recognition ultimately comes down to what is being transcribed correctly or not. To get a true read on this, it’s helpful to get a qualitative sense of how accurate your model is over another. To do that, let’s compare an automatic transcript from your model versus the Deepgram General model.

First, we’ll need to deploy your Intermediate model for use. Run this curl command to do that:

curl -X POST -u USERNAME:PASSWORD https://missioncontrol.deepgram.com/v1/deploy?model-id={model-id}

Sweet, now we can use it to get transcripts. This next step should look familiar. You’re going to select an audio file, and then submit a curl command just like you did at the beginning of the tutorial. This time, however, you’re going to specify your Intermediate model.

Transcribing a Local File with your Intermediate Model

To test with a file on your computer, run this in a terminal or your favorite API client:

curl -X POST -u USERNAME:PASSWORD -H "Content-Type: audio/wav" --data-binary @myaudio.wav "https://brain.deepgram.com/v2/listen?model={version-id}"

Transcribing a Remote File with your Intermediate Model

To test with a remote file that is publicly accessible (e.g. hosted in AWS S3 or another server), run this in a terminal or your favorite API client:

curl -X POST -u USERNAME:PASSWORD -H "Content-Type: application/json" -d '{"url": "https://static.deepgram.com/examples/interview_speech-analytics.wav"}' "https://brain.deepgram.com/v2/listen?model={version-id}"

Now, use the curl commands from the first step of this tutorial to run that same audio file through the Deepgram General model.

Go ahead and listen back to the audio file while you compare the outputs. Is one model better at recognizing certain voices over others? Does one struggle to understand industry jargon, certain accents, or product names?

Training gives you the power to teach a speech model what to pay attention to and what to improve on. The main key is supplying it the right data that represents that which you’d like it to focus on.

You have the tools. Let us know what you discover.

View the complete documentation here.

In the next tutorial, learn how to Train a Model to understand your speech data.

Finally, a reason to get excited about voice recognition.

Check out Deepgram

Contact Us

Speech recognition is hard. We'll make it easy.