Article·Tutorials·Oct 17, 2022

How to Run OpenAI Whisper in Google Colab

Table of Contents
Share this guide
Ross O'Connell
By Ross O'Connell
PublishedOct 17, 2022
UpdatedJun 13, 2024
Table of Contents

OpenAI's Whisper is an exciting new model for automatic speech recognition (ASR). It features a simple architecture based on transformers, the same technology that drove recent advancements in natural language processing (NLP), and was trained on 680,000 hours of audio from a wide range of languages. The result is a new leader in open-source solutions for ASR.

The researchers at Deepgram have enjoyed testing Whisper and seeing how it works, and we wanted to make it as easy as possible for you to try it out too. One of the things we've learned in our experiments is that, as with many deep-learning tools, Whisper performs best when it has access to a GPU. While downloading and installing Whisper may be straightforward, configuring it to properly utilize a GPU (if you have one!) is a potential roadblock.

Google Colab provides a great preconfigured environment for trying out new tools like Whisper, so we've set up a simple notebook there to let you see what Whisper can do. We set up the notebook so that you don't need anything extra to run it, you can just click through and go. The notebook will:

  • Install Whisper

  • Download audio from YouTube

  • Transcribe that audio with Whisper

Playback the audio in segments so you can check Whisper's work

  • And finally... quantitatively evaluate Whisper's performance by computing the Word Error Rate (WER) for the transcription

We think the files we chose are fun, but if you have files that you want to test Whisper on, it should be easy to upload them and drop them in!

Try the Colab

Learn more about Deepgram

Sign up for a Deepgram account and get $200 in Free Credit (up to 45,000 minutes), absolutely free. No credit card needed!

We encourage you to explore Deepgram by checking out the following resources:

  1. Deepgram API Playground 

  2. Deepgram Documentation

  3. Deepgram Starter Apps

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.