Article·AI Engineering & Research·Oct 10, 2016

New Tech Lets Journalists Find Damning Soundbites

Scott Stephenson
By Scott Stephenson
PublishedOct 10, 2016
UpdatedJun 13, 2024

Today, we're one month away from election day, and the race for the presidency is closing in on the home stretch. Newsrooms around the country are buzzing with activity: interviews, fact checking, reporting and, of course, combing through huge quantities of videos and recordings of both candidates, hunting for that juicy soundbite that might change public opinion and the course of the election.

Here's an example. I uploaded the entirety of Donald Trump's speech at the Republican National Convention from July 2016 to Deepgram. Now you can search through it just like you would with text.

Search for 'jobs', then listen.

Try also searching for:

  • taxes

  • women

  • security

  • tremendous

  • really

  • "I love"

  • Hillary

  • destruction

  • white house

  • "I'm with her"

You get taken to the exact time when those terms were mentioned during his speech. Click the red timeline markers to jump through the video to hear each audio clip. This is really fun but also shows the power of speech search.

The situation

The Washington Post's David A. Fahrenthold brought to light one such bombshell of a clip, in which Republican candidate Donald Trump was recorded saying a number of things that most would consider very offensive. It's unclear whether the clip, which was recorded in 2005, will make a significant impact on this election. After all, at the time of writing, this news only broke 24 hours ago.

But there is one thing that is clear: Fahrenthold is one heck of an investigative journalist. Regardless of which side of the political aisle you sit on, we can all agree that the world would be a better place if there were more journalists with the wherewithal to slog through reams of documents and countless hours of audio and video.

Text Search is Pretty Easy

If processing this information wasn't so labor intensive, it's likely that there would be more Fahrentholds out there. Document search is relatively easy today. If the documents are already digital, searching for keywords is trivially easy. If they're on paper, then it's simply a matter of scanning in the documents, using optical character recognition to transform images of text into text data and then performing keyword searches.

Google and Ctrl+F have been around for a while

Text search only gets challenging when dealing with a huge corpus of documents with different types of data embedded in them (like charts and tables). But still, it's a tractable problem.

For example, when the Panama Papers leaked last year, it took an international team of 400 journalists from 100 news organizations to sift through the 4.8 million emails, three million database entries, two million PDFs, one million images and 320,000 text documents that an unknown party exfiltrated from Mossack Fonseca. Using Apache Solr and Apache Tika to index and search through the data, journalists were able to map the network of shady dealings and tax sheltering schemes of some of the world's most powerful people and corporations.

It may have taken hundreds of people around the world almost a year to process through all the records, but it was doable.

Speech Search Was Hard, Until Now

Searching through recordings is really difficult. In terms of workflow, usually the raw audio is transcribed into text, which is then fed into a search tool. If you transcribe using human transcription, it's too time consuming and expensive. If you try to do it with automatic speech-to-text then search accuracy is the problem. Deepgram fixed that.

Deepgram finds speech with A.I.

Deepgram is an artificial intelligence tool that makes searching for keywords in speeches, private conversations and phone calls faster, cheaper and easier than the old way of doing things. Deepgram indexes audio files in more than half the time of a human transcriber, and costs only 75¢ per hour of audio. Compare that to the 75¢ per minute charged by most human transcription services-it's a pretty good deal.

And, Deepgram takes out the extra step of feeding transcriptions into a search platform. You can search for keywords directly within the audio recording, and jump straight to the times the keyword was mentioned in the audio. This lets reporters listen for intonation and inflection, which are totally lost during the transcription process. Deepgram makes finding the timestamp of those sound bites a breeze (I'm looking at you radio and podcast producers).

Search All Speech With Deepgram

Deepgram is a powerful "speech search engine" that makes the process of identifying key words and phrases in speeches and other spoken-word audio recordings fast and easy.

Deepgram is great for finding bits and pieces of single speeches, and its equally great at searching through a library of audio data. Let's say you came into possession of audio recordings of Hillary Clinton's speeches to big Wall Street firms like Goldman Sachs. You could load them all into Deepgram and use keyword searches to suss out the unifying themes of her speeches.

A more practical use case could involve loading all of a candidate's speeches, television appearances, etc. into Deepgram and search for hot-button issues. You could identify the moments when candidates change their positions on a certain topic, or correct them on the fly when they deny that they'd said a particular thing.

You Need This

Deepgram's speech search engine won't replace researchers and investigative reporters, but it will make them faster and more effective. Now you can search through dozens or thousands of hours of audio recordings quickly, easily, and without the hassle and expense of transcription.

If you're a journalist or news organization and want to see how Deepgram can fit into your workflow, reach out and say hello.


With contribution from Jason D. Rowley. Image: Gage Skidmore from Flickr.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.