Using Your Own Audio Data to Train a Model

BY Natalie Rutgers  | June 18 2020

In the Quickstart Guide for the Deepgram MissionControl API, you learned how to train a Model using one of the free training-ready datasets from your Deepgram account.

Now that you’ve had success there, let’s kick things up a notch and build a speech model using your unique audio.

In this guide you’ll:

  1. Create a dataset
  2. Add audio files to your dataset
  3. Add labels to your audio files
  4. Train an Intermediate Model with your datasets

Create a dataset

Assuming you’ve already created a Deepgram account, you can create a new dataset by replacing the USERNAME:PASSWORD with your credentials and submitting the following curl command. Be sure to give your dataset a useful name.

We highly recommend running these requests through jq for easy-to-read outputs.


In no time, you’ll get back a response telling you your new dataset’s dataset-id.

  "id": "dddddddd-1111-2222-3333-444444444444",
  "name": "MyDataset",
  "created": "2020-05-01T23:23:37.708528Z",
  "resource_count": 0,
  "total_duration": 0,
  "status": "UNLABELED",
  "read_only": false

Add audio files to your dataset

You’ll notice that the response you received in the last step noted that your dataset has a resource_count of 0. In Deepgram, a resource is made up of an audio file and its corresponding labels. Let’s go ahead and start creating those resources.

To upload an audio file to your dataset use this curl command, being sure to:

  • Swap in your dataset-id
  • Point to the path of your audio file
  • Give your resource a name.

The MissionControl API supports both local and publicly accessible remote file uploads. We accept most audio types.

Uploading a Local File

To upload a file from your computer, run this in a terminal or your favorite API client:

curl -X POST -u USERNAME:PASSWORD --data-binary @path/to/file.wav ""

Uploading a Remote File

To upload a remote file that is publicly accessible (e.g. hosted in AWS S3 or another server), run this in a terminal or your favorite API client:

curl -X POST -u USERNAME:PASSWORD -H "Content-Type: application/json" --data '{"url": ""}' ""

You’ll receive a response like this:

  "id": "ffffffff-0000-0000-0000-ffffffffffff",
  "name": "myfile.wav",
  "created": "2020-05-01T17:08:32.353733Z",
  "duration": 2705.436,
  "status": "UNLABELED",
  "read_only": false

Repeat this process with all the files you’d like to add to your dataset.

Add labels to your audio files

You’ll notice that all the responses from the previous step included a status of UNLABELED. We’ll fix this now.

In order for your dataset to be ready for training, you’ll need to associate labels with your audio. These labels will pair with your audio to be the truth data for your model to train on. For that reason, the labels you upload should be as close to 100% accurate as possible. If you skimp on that step, your model can only hope to be as good as what you’ve trained it on. That’s right — “garbage in, garbage out.”

There are a couple ways for you to go about this step:

  1. If you already have labels, go ahead and upload them.
  2. If you don’t have labels, you can:
    • Request that a professional transcriptionist create some for you. Remember, your free Deepgram account grants you 10 minutes of free professional data labeling.
    • Label your resources yourself in your Deepgram account.

Uploading existing labels

Before you upload, you’ll want to check that the format of your labels match what Deepgram is expecting to receive.

Your labels should:

  • Be verbatim. If your labels skip spoken words or paraphrase, your model will learn to as well.
  • Have numbers and symbols written out. Numericals have a variety of ways that they can be said. Writing them out removes ambiguity for the speech model. For example, write “four” instead of “4” or “plus” instead of “+”.
curl -X PUT -u USERNAME:PASSWORD --data-binary @path/to/test.txt "{resource-id}/transcript"  

Requesting professional data labeling

Labeling speech data is quite time consuming and requires a high level of accuracy. Luckily, your free Deepgram account comes with 10 minutes of free professional data labeling to help you get started.

To request these labels, submit the following command, being sure to specify the list of resource-ids you’d like to have labeled.

*Warning*: If the total duration of the audio submitted in your request exceeds your available labeling credits, your request will fail. If you’d like to increase your labeling credits, request an upgrade.

-H "Content-Type: application/json"
--data '{

The confirmation response for each resource will look like:

  "id": "ffffffff-0000-0000-0000-ffffffffffff",
  "name": "myfile.wav",
  "created": "2020-04-27T20:36:01.813237Z",
  "duration": 3565.548,
  "status": "IN_PROGRESS",
  "read_only": false

Professional labeling will take some time to complete, so kick back while they go to work. You’ll be emailed when your labels are complete.

Labeling yourself with Deepgram

We’ve armed you with a transcript editor in the Data section of your Deepgram account. To start using it, log in to your account, route to the Data tab, select your desired dataset, and then click into the resource you’d like to label. Make use of the helpful keyboard shortcuts and be sure to follow the instructions for optimum training results.

If at any point you decide you’ve had enough, you can always send your resources to professionals for labeling.

Train an Intermediate Model with your datasets

Now that your dataset is training-ready, you’re ready to train your model.

Go ahead and submit a curl command that names your model and associates the dataset you prepared with it.

curl -X POST -u USERNAME:PASSWORD -H 'content-type: application/json' -d '{"name": "MyModel"}'  

To associate additional datasets to your model, take advantage of PUT /models/{model-id}/datasets.

You’ll quickly get back a response that shows your new model.

  "model_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "version_id": "12345678-1234-1234-1234-1234567890ab",
  "model_name": "MyModel",
  "created": "2020-05-01T18:56:40.316185Z",
  "model_type": "USER",
  "wer": null,
  "trained_at": null,
  "status": "CANCELLED"

Go ahead and copy the model_id. We’ll use that to submit the model for training.

Perfect, plug that model_id in and run the following command:

curl -X POST -u USERNAME:PASSWORD "{model-id}&base-model-id=e1eea600-6c6b-400a-a707-a491509e52f1"  

You’ll see a response confirming that your model has been submitted and its status has changed to PENDING

 "finished": null,

Training will take some time, but you’ll be emailed once your model has finished.

Once it’s finished training, take a look at the steps for reviewing your model’s performance and deploying it for use at scale.

To transcribe with your new model, you’ll need to deploy it to SpeechEngine.

Nice work. You’re on the road to superior transcription!

Finally, a reason to get excited about voice recognition.

Check out Deepgram

Contact Us

Speech recognition is hard. We'll make it easy.