Case Study

How Vida Delivers Empathetic Healthcare Voice Agents with Deepgram Aura-2

Vida is an AI Agent Operating System that enables enterprises—especially in healthcare—to rapidly build, deploy, and manage omnichannel AI agents that handle calls, texts, emails, chat, and workflow tasks at scale. Vida needs enterprise-grade, real-time text-to-speech that delivers natural, low-latency, cost-predictable voice for high-volume healthcare agents—capabilities fulfilled by Deepgram’s Aura-2.

Vida is an AI Agent Operating System that enables enterprises—especially in healthcare—to rapidly build, deploy, and manage omnichannel AI agents that handle calls, texts, emails, chat, and workflow tasks at scale.

Business Needs

Vida is an AI Agent Operating System that enables enterprises—especially in healthcare—to rapidly build, deploy, and manage omnichannel AI agents that handle calls, texts, emails, chat, and workflow tasks at scale. Vida needs enterprise-grade, real-time text-to-speech that delivers natural, low-latency, cost-predictable voice for high-volume healthcare agents—capabilities fulfilled by Deepgram’s Aura-2.

Vida is a AI Agent Operating System, that deploys agents for many use cases, but emphasizes healthcare, benefits, and wellness enterprises. Their platform supports care navigation, medication adherence, benefits literacy, intake and triage, appointment scheduling, claims clarification, and post-visit follow-up, workflows that rely on clear, trustworthy voice communication in real time. To deliver these experiences at scale, Vida needed a straightforward text-to-speech provider that could handle healthcare's unique requirements: Deepgram’s Aura-2.

Key Results at a Glance

Since shifting high-volume workloads to Aura-2, Vida has seen consistent gains in outcomes for customers:

  • Higher task completion rates, including increases in scheduled appointments and Medicare enrollment follow-through
  • Reduced call abandonment as agents feel more human and start speaking faster
  • Significant latency improvements that enable smoother, natural conversation flow
  • Up to 50% lower TTS spend versus alternative providers in similar quality tiers
  • Scale: Hundreds of millions of TTS characters per month; tens of thousands of calls per day

The Challenge: Quality and Compliance at Healthcare Scale

Across Vida's phone agent deployments, the highest priorities are natural voice quality, accuracy for entity types (dates, times, currency, addresses, IDs, medication names), and extremely low latency for real-time interactions. Scalability and predictable cost per character are essential because their customers operate large call volumes and maintain strict service levels.

Vida supports HIPAA workloads and maintains a signed BAA with Deepgram, as several customers require strict controls around PHI handling, access logging, and data isolation. TTS systems must comply with those requirements and avoid unnecessary data retention, and accessibility expectations also apply across their portfolio, especially for members who rely on audio alternatives to digital text.

The core problem: Vida needed a text-to-speech provider that could deliver natural, accurate voice output at healthcare scale while meeting strict compliance requirements—without requiring extensive manual tuning or unpredictable costs.

Why Deepgram Won

Vida continuously evaluates multiple providers across the industry, including ElevenLabs, PlayHT, OpenAI, and Google. Deepgram consistently delivers the strongest balance of naturalness, clarity, latency, and predictable cost.

Non-Negotiables: Real-time latency and conversational flow, cost predictability at scale, HIPAA support and strong PHI controls, and precise rendering of alphanumeric sequences, regulated language, and sensitive information were all mandatory. Deepgram's ability to deliver high accuracy without manual dictionaries or SSML is a major advantage.

The Deciding Moment: Vida exposes voices from several TTS vendors within its platform, yet Deepgram voices remain the most frequently chosen by customers. Internal benchmarks and usage patterns made it clear that Aura-2 provided the best mix of quality and efficiency.

STT + TTS Under One Provider: Vida expressed the importance of this in their own words:

The Solution: Deepgram’s Aura-2 TTS

A standard Vida phone agent call begins when a member calls into a customer's line or receives an automated outreach call. The system receives context from CRM, care management tools, plan data, or EHR integrations. An LLM from Google, OpenAI, Anthropic, or another model provider generates the response text. This text is streamed to Deepgram's Aura-2 TTS API, which returns audio used in a live PSTN or SIP call.

The agent might explain benefits, provide cost transparency, confirm an appointment, help with care navigation, or drive a task such as Medicare enrollment scheduling. Aura-2's clear and natural speech improves comprehension of key details and helps members feel comfortable interacting with an automated system. The combination reduces friction and improves task completion without requiring SSML tuning or custom dictionaries.

Voice selection logic is tailored to each task, with neutral tones for information-dense flows and empathetic delivery for sensitive contexts. Aura-2 voices are available when selecting a voice configuration while building an agent:

Vida Aura 2 Voices

Vida also uses Deepgram's multilingual STT for inbound understanding, and having both directions covered by the same provider simplifies pipeline behavior and ensures consistent entity handling across languages.

Spotlight: Head-to-Head Alphanumeric Accuracy

While specific use cases vary, Vida's benchmarks consistently show clear patterns where Aura-2 outperforms alternatives:

Dates and timestamps: Alternate vendors often mis-paced dates or produced inconsistent readings of slashed formats such as "03/11/29" or "10:07 a.m." Aura-2 rendered these cleanly without extra configuration.

Currency and numerals: Some providers merged decimals or over-emphasized unit names, resulting in awkward reads such as "five point seven million dollars" when the intended phrasing was "five point seven million." Aura-2 preserved natural prosody and returned stable pronunciations across currencies and number formats.

Emails and URLs: Competing engines sometimes collapsed punctuation or pronounced URLs too quickly. Aura-2 articulated domain segments and symbols clearly.

Addresses and abbreviations: Alternate systems occasionally misread abbreviations such as "Ter." and "Ct." or inserted pauses in the wrong places. Aura-2 delivered predictable, professional pacing with no special markup.

Passwords and mixed alphanumerics: Mixed sequences such as "P@ssw0rd123" were often flattened or mispronounced by competing vendors. Aura-2 produced accurate, consistent rendering from the first attempt.

Latency behavior: Aura-2 delivered lower time-to-first-audio in streaming tests, resulting in more natural dialog flow. It also showed fewer stalls during long utterances without manual SSML tuning.

The Results

Care teams and operational managers consistently note that members better understand plan details, dollar amounts, and ID numbers when Aura-2 is used. The consistent pronunciation and stable pacing reduce confusion and escalation rates.

Looking Ahead

With Deepgram as their voice AI foundation, Vida is building toward more sophisticated capabilities. The roadmap includes enhanced style controls for dynamic voice delivery, expanded HIPAA compliance features for enterprise healthcare customers, and multilingual support starting with Spanish. New use cases on the horizon—care-plan readouts, proactive outreach campaigns, and structured clinical communication—will extend voice AI's reach across the full patient engagement lifecycle.

Try Deepgram for free with our API Playground

Test your own audio files or quickly explore its capabilities with our pre-recordings. Try it now for a seamless audio API experience!