Table of Contents
True data control in speech-to-text narrows the field fast. This comparison shows what Deepgram, Speechmatics, AssemblyAI, AWS, and Google Cloud publicly document for speech-to-text deployments that keep audio off vendor-managed infrastructure.
Key Takeaways
Here are the main findings from public vendor documentation:
- Speechmatics has the strongest public air-gap positioning.
- Deepgram has the most detailed self-hosted technical docs.
- AWS Transcribe is cloud-only in this comparison.
- Every self-hosted option requires an enterprise sales process.
- No provider here publicly confirms a self-hosted HIPAA BAA scope.
What On-Premise Speech-to-Text Actually Requires
On-premise STT only matters if your data controls match your compliance needs. You need to know what stays local, what leaves the environment, and whether contract terms cover that setup.
What Does "On-Premise" Mean for STT?
"On-premise" covers a wide range. One deployment may process audio locally but still phone home for licensing. Another may run fully disconnected with zero outbound traffic.
A connected container can still keep audio local. But if it sends metadata to the vendor, your compliance review still needs to account for that path.
No-Log Policies, Air-Gap, and VPC Isolation: The Spectrum of Data Control
You'll usually see three levels of control. They aren't interchangeable, even when sales language makes them sound close.
- VPC isolation: Audio stays inside your private network boundary, but the vendor still processes it remotely.
- Connected container: The STT engine runs on your hardware but still has outbound connectivity.
- Full air-gap: Zero outbound connectivity. All components stay inside your environment.
Why "Deployment Flexibility" Rarely Means Air-Gapped
Deployment flexibility doesn't equal air-gap support. In this comparison, only one provider publicly documents on-premise deployment as suitable for air-gapped environments.
How the Major STT Providers Handle Self-Hosted Deployment
The provider split is straightforward. Deepgram and Speechmatics give you the clearest self-hosted paths, AssemblyAI documents self-hosting with fewer air-gap details, and AWS Transcribe stays cloud-first.
Deepgram: Self-Hosted on Your Infrastructure
Deepgram's self-hosted deployment runs as Docker and Podman images. You can deploy on Kubernetes, Amazon SageMaker, bare-metal servers, or major cloud providers. The architecture uses an API container and an Engine container.
Audio data and transcripts stay on your infrastructure in self-hosted mode. Containers still establish outbound connections for license verification and usage reporting. A License Proxy can route licensing traffic through a single egress point. Public technical docs don't confirm complete zero-egress operation.
Deepgram marketing references self-hosted deployment for on-premise and air-gapped use. The technical docs don't use the term "air-gap" explicitly. If zero egress is a hard requirement, get that confirmed during procurement.
Speechmatics: On-Premise and On-Device Options
Speechmatics has the strongest public air-gap positioning in this comparison. Its product pages categorize on-premise deployment as suitable for secure or air-gapped environments, and a vendor-authored article describes use in courtrooms with no internet connection.
Speechmatics also offers an on-device product for endpoint use. It runs on standard business laptops with roughly 1 CPU core, an AI accelerator, and 800 MB of memory. That's separate from server-side on-premise deployment, but it matters for edge use cases.
The main gap is operational detail. Public sources don't explain how updates reach air-gapped deployments or whether license renewal needs periodic connectivity—worth asking directly before you sign.
AssemblyAI: Kubernetes and GovCloud Self-Hosting
AssemblyAI documents self-hosted deployment on Kubernetes, AWS ECS, and GovCloud environments. Each instance handles up to 48 concurrent streams. Its Universal-Streaming model is available for self-hosted use.
Public materials don't address air-gapped deployment. Connectivity requirements for the deployed container also aren't published. If zero outbound connectivity matters, that needs direct vendor confirmation.
AWS Transcribe and Google Cloud Speech: Cloud-First by Design
AWS Transcribe is a cloud service. No self-hosted binary or container is discussed in the cited material. The closest pattern is GovCloud paired with Direct Connect and PrivateLink. Audio still travels to AWS-managed infrastructure over a private path.
Google Cloud offers two on-premise paths in this comparison. Speech-to-Text On-Prem runs as a container on GKE or Anthos under private contract. Google Distributed Cloud Air-Gapped is a separate air-gapped product. Speech-to-Text availability on that path still requires direct verification.
Compliance Certifications and What They Mean in Self-Hosted Context
Compliance labels help you shortlist vendors, but they don't answer deployment scope by themselves. For self-hosted STT, you still need confirmation that the agreement or audit boundary covers the exact architecture you plan to run.
HIPAA BAA Coverage in Self-Hosted vs. Cloud Deployments
Deepgram maintains HIPAA-aligned deployments, and BAA terms are handled through sales and enterprise agreements. The public documentation doesn't clarify whether that BAA scope extends to deployments where Deepgram never handles your audio.
The same gap appears elsewhere. Speechmatics lists HIPAA, but no public source here confirms a signed BAA offer. AssemblyAI references BAA coverage for its services, but the self-hosted scope isn't made explicit in the material cited in this article.
AWS Transcribe is described as HIPAA-eligible, and Google Cloud states that its BAA covers Speech-to-Text. This article doesn't establish the same scope for every on-prem or GDC path. If you process PHI, get the deployment mode named in writing.
FedRAMP Authorization: Which Providers Qualify and for What
Only AWS and Google list FedRAMP authorizations relevant to these STT workloads. In both cases, the authorization is platform-level rather than a clean self-hosted STT confirmation.
AWS GovCloud holds FedRAMP High. Google Cloud Speech-to-Text is framed within Google Cloud's FedRAMP High authorization boundary via Assured Workloads. Deepgram, Speechmatics, and AssemblyAI don't confirm FedRAMP authorization in the material compared here.
SOC 2 and Data Residency in Self-Hosted Context
SOC 2 and similar certifications help with baseline screening. They still leave an important scope question for customer-operated infrastructure.
Deepgram holds SOC 2 Type II certification, and Speechmatics lists SOC 2 Type II and ISO/IEC 27001. Public sources in this article don't document whether those scopes extend to customer-run self-hosted environments.
The Enterprise Contract Gate: What No One Tells You Upfront
Self-hosted STT goes through enterprise sales, contract negotiation, and deployment scoping—not a self-serve checkout. In this comparison, every self-hosted path sits behind that process.
Why Self-Hosted STT Requires an Enterprise Agreement
Deepgram requires account representative pre-authorization before container deployment begins. Speechmatics routes on-premise inquiries through sales. AssemblyAI directs buyers to its sales team.
That means procurement takes time. Budget several weeks before deployment starts, especially if legal and security teams need to review outbound traffic, BAA language, or audit terms.
What to Negotiate Before You Sign: SLAs, Model Updates, and Audit Rights
Before signing, get clear answers to these questions:
- What endpoints receive outbound traffic?
- What fields appear in telemetry?
- Does the BAA explicitly cover container deployment?
- Can the container run indefinitely without connectivity?
- How are model updates delivered offline?
Open-Source as an Alternative: When It Makes Sense
Open-source STT can be the cleanest path for strict isolation. It gives you full control, but it also shifts the accuracy, latency, and operations burden onto your team.
Tools like OpenAI Whisper avoid vendor licensing telemetry. The trade-off is operational overhead and, for many production workloads, weaker performance than managed self-hosted options with lower word error rates. If you've wrestled with self-hosted ML ops before, you know that "no vendor dependency" can quickly become "all your problem."
How to Choose the Right Deployment Model for Your Workload
The right choice starts with your compliance requirements and ends with your infrastructure reality. If you need verified air-gap support, the field gets narrow quickly.
Compliance-Driven Selection: FedRAMP, HIPAA, and Data Sovereignty
If FedRAMP is mandatory, AWS GovCloud and Google Cloud are the listed options in this comparison. If you need audio to stay on your network, Deepgram and Speechmatics offer the strongest on-network control in this group.
Neither provider publicly confirms here that its BAA covers self-hosted deployment. If you need true air-gap for classified environments, Speechmatics has the strongest public documentation.
Infrastructure Capacity and TCO Considerations
Self-hosted STT also means hardware planning. Deepgram's deployment requirements specify at least one NVIDIA GPU with 16 GB VRAM, 4 CPU cores, and 32 GB RAM per STT node.
Only NVIDIA GPUs are supported. Multi-Instance GPU partitioning isn't supported. Start GPU procurement alongside contract review.
Making the Right Call on Data Control
For regulated buyers, public vendor documentation supports different confidence levels, and those gaps should shape your shortlist.
Decision Framework: Self-Hosted vs. Cloud vs. Open-Source
Cloud APIs fit workloads where audio can transit vendor infrastructure. Self-hosted containers from Deepgram, Speechmatics, or AssemblyAI fit workloads where audio must stay on your network.
Open-source fits teams that need zero vendor dependency and can carry the operational burden. That route can work well, but it rarely feels simple after week two.
How to Evaluate Before You Commit
Start with your own audio. Validate accuracy and latency first, then move procurement questions forward in parallel.
To get started with Deepgram, review current rates at deepgram.com/pricing, test the cloud API against your domain vocabulary, and then engage the enterprise team for self-hosted scoping.
Getting Started with Deepgram
You can use the cloud API to test your production audio before you enter self-hosted procurement. That gives you a cleaner baseline for accuracy, vocabulary fit, and operational expectations.
New accounts may receive $200 in free credits—confirm the current offer at signup. Try it yourself with the cloud API, test it against your production audio, and use those results to build the case for self-hosted deployment.
FAQ
Does on-premise STT work for real-time streaming, or only batch transcription?
It can support real-time streaming, but your deployment design matters as much as the model. If you need live captions or agent assist, check whether your chosen setup documents WebSocket support, how many concurrent streams each instance can hold, and what hardware sits behind it. A self-hosted stack can stream well, but weak network design will still make it feel like molasses.
Can I use Deepgram's self-hosted deployment in an air-gapped environment with no internet access?
Don't assume that from the public technical docs alone. Deepgram references air-gapped deployment in marketing material, but the docs don't confirm zero-egress operation. During procurement, ask for a written answer on license verification, usage reporting, update delivery, and whether the License Proxy still needs any outbound path.
What GPU hardware do I need to run a production on-premise STT deployment?
The documented Deepgram baseline is one NVIDIA GPU with 16 GB VRAM, 4 CPU cores, and 32 GB RAM per STT node. Treat that as a starting point, not your final sizing plan. If you expect sustained streaming loads, test with your own audio, concurrency targets, and latency thresholds before you buy a pile of GPUs you later regret.
Is there a pricing difference between Deepgram's cloud and self-hosted options?
Yes. Self-hosted pricing isn't published and goes through enterprise negotiation, while cloud pricing is listed at deepgram.com/pricing. In practice, your comparison also needs to include hardware, GPU availability, deployment time, and the internal cost of running updates, monitoring, and support.
How does self-hosted deployment affect HIPAA BAA coverage?
It changes what you need to verify. A cloud BAA statement doesn't automatically answer what happens when audio stays on your infrastructure and the vendor mainly provides software, licensing, or support. If you handle PHI, ask for the exact deployment mode to be named in the agreement so your legal and security teams aren't left reading tea leaves.









