Table of Contents
Deepgram vs Twilio: Key Differences for Real-Time Transcription
Deepgram vs Twilio compares two different platform categories. Twilio is a telephony service. It routes calls, sends SMS, and manages phone numbers through products like Programmable Voice and related communications APIs. Deepgram is a dedicated speech-to-text API built for transcription accuracy and speed. When you search "Deepgram vs Twilio," you're really asking an architectural question. Should transcription live inside your call platform? Or should you extract the audio and route it to a specialized STT layer?
The answer depends on how much control you need. In many production use cases, these platforms work together. Twilio handles phone infrastructure. Deepgram handles transcription. This article breaks down the tradeoffs that matter for your stack.
Key Takeaways
Choose Twilio's managed path for faster setup. Choose Deepgram direct for more STT control.
- Twilio offers two transcription paths: Gather for IVR input and ConversationRelay for voice agent workloads.
- As of 2026, new Twilio ConversationRelay accounts created after the September 2025 rollout default to Deepgram's nova-3-general, per Twilio's changelog.
- Routing Twilio Media Streams to Deepgram's API unlocks full model control, keyterm prompting, and parameters ConversationRelay doesn't expose.
- ConversationRelay bundles voice infrastructure features differently than a direct Media Streams plus Deepgram setup. Check current rates at Deepgram's pricing page.
- Both platforms support HIPAA-eligible configurations, but compliance depends on the path you use and how you configure it.
What Each Platform Actually Does
Twilio is the call infrastructure. Deepgram is the speech-to-text layer.
Defining Twilio as Communication Infrastructure
Twilio provides programmable voice, SMS, and video. It connects calls over the PSTN, manages phone numbers, and routes media. Twilio doesn't build its own STT models. It routes audio to third-party providers like Google STT or Deepgram through Gather and ConversationRelay.
Defining Deepgram as a Dedicated STT Layer
Deepgram's platform is built around speech recognition. Its Nova-3 model is described in the models overview, and you can measure output quality with Word Error Rate. You send audio directly to Deepgram's WebSocket API and get transcription results back.
Why the Comparison Is Architectural
The real question is where transcription should live in your architecture. You can keep it inside your call platform as a managed feature. Or you can run it as a separate STT layer you control directly.
How Twilio Handles Transcription: Two Paths
Twilio gives you a short-utterance path and a continuous streaming path. Gather is for IVR-style input, while ConversationRelay is for voice agents.
The Gather Verb: Google STT V1, Google STT V2, and Deepgram as speechModel Options
The Gather verb is designed for collecting short utterances, like menu selections or account numbers. Speech recognition isn't enabled by default. You must set input="speech" explicitly. Once activated, you can choose from three provider tiers: Google STT V1 legacy models, Google STT V2 models like googlev2_telephony, or Deepgram models. Pricing is per utterance, not per minute.
ConversationRelay: Real-Time WebSocket Transcription with Provider Selection
ConversationRelay is Twilio's managed voice agent layer. It streams continuous transcription over a WebSocket to your server. You receive text transcripts, not raw audio. The transcriptionProvider attribute accepts two values: Google or Deepgram. As of 2026, accounts that first used ConversationRelay after September 12, 2025 default to Deepgram nova-3-general. Legacy accounts still default to Google. When Deepgram is selected, the deepgramSmartFormat attribute is enabled by default. It reformats dates, times, currency, and numbers into conventional written forms.
What You Lose When Transcription Is Embedded in the Call Platform
ConversationRelay simplifies integration by handling audio processing for you. But that simplicity comes with constraints. You're limited to two STT providers. You can't access raw audio for custom processing. The hints attribute provides vocabulary boosting for your STT provider, but you don't get the full Keyterm Prompting parameter control available through Deepgram's direct API. There are no mid-stream keyterm updates, no explicit token-limit enforcement in TwiML, and no access to Flux or specialized model variants.
Where Deepgram's Direct API Changes the Equation
The direct path gives you raw audio access and full STT control. It also adds engineering work because you have to bridge Twilio and Deepgram yourself.
Twilio Media Streams as the Audio Extraction Layer
Media Streams gives you access to raw call audio over a WebSocket. Twilio sends mulaw audio at 8000 Hz, base64-encoded inside JSON messages, as documented in its WebSocket messages reference. Your server receives these messages. You strip the base64 encoding and forward the raw bytes to Deepgram's WebSocket endpoint.
What Deepgram's Direct API Unlocks
Through the direct API, you get access to Deepgram's full feature set. Keyterm Prompting lets you boost recognition for up to 100 domain-specific terms per request. Multi-word phrases are treated as cohesive units. On Flux, you can update keyterms mid-stream without reconnecting. You also get access to the configurable STT features available for the model you choose.
Integration Complexity: What the Added Step Costs You
The direct path requires a WebSocket bridge server. Your code maintains one connection to Twilio and another to Deepgram. It also translates between Twilio's JSON-wrapped audio format and Deepgram's binary protocol. You handle interruption logic by sending Twilio's clear event when speech is detected. Deepgram provides reference implementations in Node.js covering the pattern. Not the lightest lift, but the control is worth it if you need full STT access.
Compliance and Deployment Constraints
Both platforms can support regulated workloads, but eligibility depends on the exact path, provider, model, and transcript destination. Validate compliance at the architecture level, not just the vendor level.
Twilio ConversationRelay and HIPAA Eligibility
ConversationRelay is a HIPAA Eligible Service when configured properly. Two conditions must be met: proper configuration and a signed BAA with Twilio. As documented in Twilio's transcription guidance, both Google and Deepgram using nova-2 or nova-3 monolingual are HIPAA-eligible for both webhook and persisted transcript destinations. Nova-3 multilingual mode isn't HIPAA-eligible inside ConversationRelay, regardless of transcript destination.
Deepgram's HIPAA BAA and Self-Hosted Options
Deepgram maintains HIPAA-aligned deployments; BAA terms are available for enterprise customers through sales and enterprise agreements. Deepgram also holds SOC 2 Type 2 certification and PCI compliance, as listed in its compliance documentation. If you need full data control, Deepgram offers self-hosted deployment on VPC or on-premises infrastructure. In self-hosted deployments, you can keep audio and transcript processing within infrastructure you control, subject to deployment configuration.
PCI Compliance: Provider Selection Matters in Both Paths
PCI compliance inside ConversationRelay depends on your provider, model, and transcript destination. Both Deepgram, using nova-2 or nova-3 monolingual, and Google are PCI-compliant when using webhooks. Persisted transcripts aren't PCI-compliant for either provider. Avoid placing sensitive data in TwiML attributes like hints or welcomeGreeting. Twilio warns these fields aren't PCI-protected.
Pricing Structure and Cost Modeling
Twilio bundles more of the voice stack. Deepgram prices transcription separately, which can make the direct path cheaper at the STT layer.
How Twilio Meters Voice and Transcription Costs
Twilio's voice pricing breaks into two layers. Base PSTN call charges apply regardless of what features you add. ConversationRelay pricing adds an additional per-minute charge that bundles both STT and TTS. Media Streams pricing adds a separate per-minute charge. Gather speech recognition is billed per utterance. These charges are additive.
Deepgram's Per-Minute Pricing and How It Stacks Against Bundled Twilio Rates
Deepgram charges per minute of audio processed. There's no streaming surcharge. See current rates at Deepgram's pricing page. New accounts start with $200 in free credits. Growth plans, starting at $4,000/year prepaid, offer lower per-minute rates. Smart Formatting is included at no extra cost. Note that Keyterm Prompting is a separately metered add-on; check current rates at deepgram.com/pricing.
When the Separate-Layer Approach Gets Cheaper at Scale
The Media Streams plus Deepgram direct path can lower STT-layer costs compared with ConversationRelay's bundled rate. This becomes more relevant at scale. But the direct path doesn't include TTS. You'll need to source and pay for that separately. The tradeoff is more engineering work in your bridge server, audio format conversion, and operational monitoring across two external API connections per call.
Which Path Fits Your Architecture
Use ConversationRelay if you want the simplest Twilio-native setup. Use Deepgram direct if you need more model control, broader language coverage, or tighter control over regulated workflows.
When to Use Twilio ConversationRelay with Deepgram as the STT Provider
Choose this path when you want the fastest integration with minimal audio handling. Your server receives text transcripts over a WebSocket. You don't manage audio buffers, format conversion, or interruption logic. This works well for standard voice agent workloads where nova-3-general meets your accuracy requirements. The hints attribute covers basic vocabulary boosting. Elerian AI benefits from having Deepgram's accuracy available directly inside managed call infrastructure.
When to Use Deepgram's API Directly via Twilio Media Streams
Choose this path when you need Keyterm Prompting with full parameter control, access to Flux for voice agent-optimized turn detection, or support for languages available in Nova-3. This tradeoff favors control over convenience. It's also the right choice for regulated workloads that need Nova-3 multilingual mode, since that mode isn't HIPAA-eligible inside ConversationRelay. Vida Health uses Deepgram's direct API for transcription workflows.
When to Skip Twilio Entirely
If your voice infrastructure doesn't run on Twilio, you don't need Twilio as a middleman. Deepgram's WebSocket API accepts audio from any source. You can connect it to SIP trunks, browser-based WebRTC streams, or other audio sources. Deepgram's Voice Agent API combines STT, TTS, and LLM orchestration with bundled pricing for voice agent workflows without a separate telephony dependency.
Get Hands-On
If you want the shortest path to a real answer, test both architectures with your own audio. A quick bake-off usually tells you more than another week of doc reading.
Try the Direct Path
To measure control, route raw audio into Deepgram and compare the outputs you get from your own prompts and settings.
Check the Managed Path
If you care more about setup speed, start with ConversationRelay and see whether its provider controls and transcript output already cover your requirements.
Grab Free Credits
To test without a big setup commitment, start free here. New accounts start with $200 in free credits — confirm the current offer at signup.
FAQ
The bottom line is simple: use ConversationRelay for speed and Deepgram direct for control. These five questions cover the most common architecture and compliance decisions.
Can You Use Deepgram Inside Twilio Without Managing Your Own WebSocket Server?
Yes. ConversationRelay handles audio routing for you. Set transcriptionProvider="Deepgram" in your TwiML, and Twilio sends text transcripts to your server.
Does Twilio ConversationRelay Support Deepgram's Keyterm Prompting Feature?
Not directly. The hints attribute provides vocabulary boosting for your STT provider, but it isn't documented by Twilio as a direct mapping to Deepgram's Keyterm Prompting. You don't get mid-stream dynamic updates or the full parameter control available in the direct API.
What Happens to Transcription Accuracy When Twilio Routes Audio to Deepgram Versus Calling Deepgram Directly?
The underlying STT model can be the same in both paths. The difference comes from parameter access, model settings, and whether you need features exposed only in the direct API.
Is Twilio's ConversationRelay Available in All Twilio Regions, or Are There Geographic Restrictions?
Twilio and Deepgram both publish regional availability in their documentation. Check the current compliance and deployment documentation for your target architecture.
Does Using Deepgram as a Twilio Transcription Provider Affect How Twilio Logs or Stores Call Data?
Transcript handling and compliance depend on your provider, model, destination, and account configuration. Persisted transcripts aren't PCI-compliant for either Deepgram or Google inside ConversationRelay.









