Article·Announcements·Jun 13, 2024

We Raised $12 Million to Solve Speech Recognition in the Enterprise

At Deepgram, we're working to change the speech recognition game from the ground up. Learn about our Series A here.

Why Speech Recognition?Rebuilding Speech From the Ground up Making Speech Work In The Enterprise

Share this guide

By Scott StephensonCEO

Last UpdatedJun 13, 2024

Why Speech Recognition?Rebuilding Speech From the Ground up Making Speech Work In The Enterprise

The excitement around speech recognition is real: it has the potential to power the next wave of modern applications and give businesses and vendors a competitive advantage. But, with excitement comes misaligned expectations. Speech recognition is a messy, tough and persistent problem for enterprises, one that has languished under existing technology providers for decades. At Deepgram we have been working to change that by rebuilding speech recognition from the ground up. Today, we celebrate a key milestone on our path with a $12 million Series A round led by Wing VC, with participation from NVIDIA, Y Combinator, Compound and SAP.iO.

Why Speech Recognition?

Today, getting actionable information from recorded phone conversations and meetings is time and resource-intensive, costly and cumbersome. Audio recordings don't play by the same rules as text or data. They're messy and idiosyncratic and go far beyond the short pre-programmed phrases that Siri and Alexa rely on. There's no silver bullet to speech recognition, especially when it comes to speed, scale, accuracy and reliability.

You can read Zach's blog post about the need for Deepgram and the importance that companies utilize speech recognition to better interpret customer needs and serve their employees.

Rebuilding Speech From the Ground up

The idea for Deepgram began while I was a PhD student at University of Michigan. My cofounder and I were researching the detection of dark matter two miles underground and in the hours not devoted to research, we life-logged (we made devices that recorded backup copies of the audio surrounding us, 24/7). When we tried to go back and find key conversations and specific moments in those audio files, we felt the very real pain of not having a good tool available to help process the recordings and pinpoint valuable timestamps. That was the spark that created Deepgram.

Deepgram has taken an entirely new approach to speech recognition, replacing what hasn't worked-heuristics-based speech processing-with fully end-to-end deep learning. Audio recordings are complex and infinitely varied, meaning there is no one quick-fix to speech recognition. That's why we train speech models to learn and adapt under complex, real-world conditions with customers' unique vocabularies, accents, product names and acoustic environments. Companies dealing with challenging audio from conference calls or call centers previously struggled to make speech scalable, precise and fast enough. With Deepgram, they can transform their speech data into an enterprise asset. Our speech recognition reliably acts as a foundational layer within the next generation of business applications, allowing companies to build something with speech that actually works.

Making Speech Work In The Enterprise

Since going to market, we've amassed customers across the call center, retail and tech industries, and partnered with some of the leading large-scale communication and conferencing providers. Developers, data scientists, product managers and CIOs at these companies all trust Deepgram because our unique approach delivers a high-level of accuracy quickly, and at scale. Our customers and partners work with us because of our vision, our team and our commitment to continually refining and innovating our product. As part of that product innovation, along with our Series A, we're also announcing two new features of our platform:

Real-Time Streaming: an industry-first advancement in speech recognition that lets our customers analyze and transcribe speech as words are being spoken. More complex use cases are long running real-time transcription for meeting platforms or powering real-time agent assist for call center agents to achieve more effective customer service. A simple use case is "command and control" interactions like dictating doctor's notes or ordering takeout from your favorite restaurant chain.
On-Premises Deployment: Deepgram On-Premises Deployment provides a private, deployable instance of the Deepgram platform for speech recognition use cases involving confidential, regulated, or otherwise sensitive audio data in enterprise. It delivers the same scalable, high-performance, high-accuracy speech recognition capability as the Deepgram cloud, while allowing enterprises to manage the solution on-premises.

We're so excited about what's next. The speech recognition opportunity is huge, and the endorsement from these amazing investors validates that we have the team, technology and vision to crack it. We strive to become the de facto speech company by unlocking valuable voice data for our customers, giving them a competitive advantage in their industry. This round is going to help us do just that.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.