Article·Announcements·May 17, 2023

State of Voice 2023: Language AI Takes Center Stage

Jason D. Rowley
By Jason D. Rowley
PublishedMay 17, 2023
UpdatedJun 13, 2024

TL;DR: Download and read the report directly to access even more in-depth insights into the rapidly-evolving world of Language AI

In case it wasn’t abundantly clear, Language AI has either already transformed, is currently transforming, or is positioned to transform just about every industry and occupation in the global economy. Between rapid adoption of artificial intelligence by individuals and enterprises alike, the dizzying speed of academic research and industrial R&D releases, and the din of AI chatter from CEOs, influencers, and developers, it’s evident that we’re in the midst of a technological revolution unseen since the advent of smartphones, or even the World Wide Web.

By “Language AI,” we’re referring to several branches of technology. There’s Voice AI—automatic speech recognition (ASR), transcription, machine translation, natural language understanding, speech synthesis, and more. There’s Conversational AI—chatbots, intelligent virtual assistants, agent-assist solutions, and other software that facilitates discussions. And there’s the rapidly-evolving world of Language Models—from LLMs like OpenAI’s ChatGPT and Anthropic’s Claude to domain-specific models that run fast and pose less risk of going off the proverbial rails.

The Language AI revolution is, in part, enabled by a parallel trend: the diffusion of deep learning throughout the technology industry. Deep neural networks—fine-tuned, massively complex statistical models trained on millions or billions of data points—have demonstrated an uncanny, unreasonable effectiveness at doing just about any task they’re designed to accomplish. For example, end-to-end deep learning (E2EDL) achieves speech-to-text transcription results that are nearly equivalent to human accuracy. Properly provisioned deep neural networks can operate at a virtually limitless scale, and offer significantly faster and more cost-effective transcription compared to human transcribers. Right now, there’s a nontrivial amount of human effort involved in refining deep neural networks, but in the not-too-distant future, self-learning neural networks will become the new normal.

Taken together, these core technologies are the foundation upon which the future of business is built. It’s a bold claim, but as a foundational AI company on a mission to understand human language, it’s one that Deepgram stands by, and it’s one we aim to investigate further. That’s why Deepgram partnered with Opus Research to interview 400 industry leaders to understand how Language AI is reshaping the business landscape.

State of Voice: Understanding Language AI’s Role in the Future of Business

2023 marks the third year that Deepgram and Opus Research delved into the state of voice and other language AI technologies. In 2021, we started with an examination of automatic speech recognition. In 2022, we expanded the scope of our research to include the entire speech technology industry. Given the Cambrian explosion of language models and advancements in speech and conversational AI we’ve witnessed over the past six months, we’d have been remiss to not widen our lens even further for this year’s industry snapshot.

In October 2022, Opus Research surveyed 400 decision-makers—ranging from team leaders to CEOs—at companies of all sizes, representing nearly a dozen industries, to understand how Voice AI factors into their business processes today, and what they expect for the future. Even though survey data was collected prior to the public release of ChatGPT, the 2023 State of Voice Report paints a compelling picture: speech AI technology is woven into the fabric of the modern enterprise. The meteoric rise of language models in recent months likely only increases the salience of Language AI in the minds of business leaders.

Key Findings From the Survey

If you’re looking for the “Executive Summary to the Executive Summary” here are some of the most relevant statistics surfaced from our survey data:

  • Voice tech is widespread. 82% of respondents say their companies use voice technology today, up from 76% reported in our 2022 State of Voice report.

  • Voice AI is crucial to the future of business. Two thirds of respondents say voice-enabled experiences are important to the future of their companies’ strategies.

  • Human-like voicebots are inevitable. When it comes to voicebots, the future is here (or at least imminent). 43% of respondents say that voicebots will achieve human-level parity in less than 1 year, and 54% anticipated a 1-3 year time horizon.

  • Demand for voice data utilization is massive, and unmet. Only 1% of respondents said they transcribe more than 75% of their available audio data. 84% of respondents transcribe less than half of their available audio data.

  • Technical advancements remove roadblocks to adoption. We asked about factors that would drive increased adoption of speech technology. 74% of respondents cited “increased speech-to-text accuracy,” 68% cited “expanded voice intelligence capabilities,” and 64% said “lower cost.” Deepgram’s recent Nova release, paired with massive improvements in language detection and speaker diarization quality, deliver the most accurate, fastest, and affordable Voice AI solution on the market today.

Survey results highlight the increasing ubiquity and strategic importance of voice technology in businesses, with human-like voicebots on the horizon and untapped potential in voice data utilization. Continued advancements in accuracy, intelligence, and cost efficiency will further propel its adoption.

Customization is Key

Survey data shows that the ability to customize Voice AI models is a major driver of value. This is indicated by two key data points. When asked about which attributes would constitute the “best” speech recognition model, 50% of respondents said that “custom model training” is a top priority. Answering a separate question, 35% of respondents said that the ability to train custom speech models would unlock greater adoption of speech technology in the enterprise.

Customized models allow companies to get even more value out of their voice data. By allowing enterprises to tailor voice AI systems to recognize their unique dialects, accents, and industry-specific jargon, it significantly enhances user experiences and interactions. The true hallmark of this feature, however, is its potential to reduce error rates. Training AI models on company-specific data leads to a substantial increase in voice recognition accuracy—essential for industries with niche vocabularies. In essence, customizability serves as a strategic advantage that optimizes accuracy, minimizes errors, and thereby catalyzes broader adoption of speech technology across various sectors.

The arrival of sophisticated language models is taking Voice AI to new heights. These models can adapt to specific data, sharpening their voice recognition capabilities and ability to grasp context. The upshot? Interactions become smoother and more intuitive. Technical advancements give Voice AI a major boost, paving the way for better personalization and encouraging more industries to hop on the bandwagon.

The Voice AI Shift: From Cost Savings to Value Creation

Once viewed as a simple way to cut costs, Voice AI technology has advanced to the point where it generates tangible value, not just saves time and money. Given macroeconomic trends and their attendant constraints on available resources, it’s no surprise that 67% of respondents said that improving productivity is a motivator to adopt speech technology, and 45% said they adopted Voice AI to promote operational efficiency.

However, adopting Voice AI is viewed as a way to grow businesses, too. 54% of respondents said that they adopted speech technology to identify new business opportunities, and 49% were motivated by the prospect of increased business revenue. 

Motivations are one thing, but what about actual results? 79% of respondents said they increased revenue by up to 50% following the adoption of speech technology. 97% of respondents reported at least some cost savings. And 99% of respondents saw gains in productivity.

The Cost of Missing Out On the Language AI Revolution

If nothing else, this year’s report highlights the necessity of Language AI in the enterprise. For some businesses—like contact centers, telehealth, and content moderation providers—Language AI is at the core of their products. For others, Language AI is embedded into business processes, from composing emails with an AI assistant to conducting customer meetings with help from an intelligent agent that cues up talking points on the fly. For nearly everyone, though, Language AI is the source of competitive edge. It’s no wonder that 81% of respondents said they plan to increase their speech technology budget over the next 12 months.

One of the most illuminating questions in our survey asked, what would happen to respondents’ businesses without voice capabilities. Three quarters of respondents said that revenue would either plateau or decline, but the impact of voice technology only becomes more apparent. 96% of respondents said that they’d expect the pace of customer acquisition to plateau or decline. A majority (59%) of respondents said that customer satisfaction (CSAT) would decline, and another 40% said CSAT scores would plateau. Similarly, 63% of respondents said their Net Promoter Score (NPS) would decline without voice capabilities; just 1% said NPS would not be affected.

Language AI in general, and speech AI in particular, is positioned to be a key differentiating factor in the coming months and years. Simply put: Companies that incorporate Language AI into their business processes will outcompete companies that do not. Companies that implement their Language AI systems thoughtfully, with consideration for what’s now the literal voice(s) of their customers and brands, will outcompete those that just tack on a speech model and call it a day. 

If there’s one thing we take away from the State of Voice 2023 Report, it’s that Language AI is now mainstream. Hooking an ASR model into your customer-facing phone system is no longer the purview of corporate innovation offices and internal R&D labs; it’s just part of doing business as usual today. You don’t need a PhD in particle physics to implement world-class Language AI anymore. All it takes is a little bit of programming knowledge and an API key.

The democratization of artificial intelligence through developer-first platforms is likely to be the key driver of Language AI adoption in 2023 and beyond. As for what else the future of Language AI holds, Deepgram is the partner you can trust to build it.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.