Rate limit errors have a way of appearing at the worst possible moment: during a demo, right as your customer starts ramping up, or when your agent traffic finally starts scaling.
Nothing kills momentum like a 429 error when your voice agent should be handling 20 concurrent calls but your infrastructure is capped at 15. You’ve built something that works, your users want it, and then you hit a ceiling that has nothing to do with your code.
Today we’re raising that ceiling. With 1,300+ organizations powered by Deepgram, this infrastructure investment is part of our commitment to scaling the platform for the Voice AI economy.
What’s New: Production-Grade Concurrency from Day One
We're tripling default concurrency limits across Voice Agent API, Streaming STT, and TTS. Growth Plan customers get up to 4.5x.
New default concurrency limits:
| API Product | Pay as You Go | Growth Plan |
|---|---|---|
| Voice Agent API (connections) | 45 (was 15) | 60 (was 15) |
| Streaming STT (streams) | 150 (was 50) | 225 (was 50) |
| WSS TTS (streams) | 45 (was 15) | 60 (was 15) |
These changes apply automatically today — no action needed on your end.
Why This Matters for Voice AI Teams
Here’s what the increase means for teams building on Deepgram. As Voice AI moves from pilot to production across enterprise teams, the infrastructure underneath has to keep up. These new defaults are part of Deepgram’s broader platform investments, the same foundation powering teams from startups to household name enterprises using AI at scale.
Built for Teams Serving Multiple Customers
If you’re building a conversational AI platform serving thousands of customers, or a meeting intelligence product processing high volumes for enterprise clients, the math just got 3x better.
Built to scale: With 45 WSS streams, 10 clients can now burst to 4–5 streams each, providing headroom for multi-turn conversations without a single customer’s traffic spike impacting other tenants. That’s the difference between one customer's spike taking everyone else down and a reliable production system.
Voice Agent stack: If you’re running STT, TTS, and Voice Agent API together, the new limits give all the room you need to scale. You can now run 45+ concurrent agents with headroom for traffic spikes.
What this means in practice:
- Fewer HTTP 429 errors during integration and production scaling
- More reliable user experience across your customer base and regions, no service failures during heavy growth or demand spikes
- Scale without filing support tickets
Use cases that scale faster:
- Conversational AI platforms scaling agents across multiple customers and regions
- Meeting intelligence products processing high volumes for enterprise clients and multiple languages
- Contact center analytics teams serving hundreds of locations or multiple regions
- Healthcare, legal, and financial teams running high-throughput, multi-tenant workloads
Transparent Upgrade Paths
We publish our concurrency defaults by payment plan. You know exactly what you get, with no surprises and no support tickets to figure out your limits. As you consume more, concurrency automatically scales with your growth as you move to higher plans, with Enterprise offering the highest concurrency support. Additional capacity beyond your plan is available if you need it. Reach out to us for details.
Guaranteed Capacity from Day One
Some vendors market “unlimited” concurrency but implement dynamic scaling: 10% ramp-up periods every 60 seconds when you exceed 70% utilization. During a traffic spike, that’s a 25-minute wait to scale from 100 to 1,000 streams, assuming perfect ramp-up conditions. Your application waits while their infrastructure catches up, which essentially means your customers pay the price when you're trying to deliver sub-second response times.
Other vendors start concurrency so low that builders are forced into spend-based tier advancement and manual approvals just to reach production-grade limits.
Deepgram starts with high, guaranteed floors, all available immediately. The infrastructure is pre-provisioned for scale, so you can move from prototype to production without waiting for permission.
Your critical voice infrastructure keeps up with your growth.
A Few Things to Know
These are guaranteed floors, with transparent limits. You know exactly what capacity you have from day one.
This is a permanent platform enhancement, built into your plan going forward.
Some deployments may take longer. Regional (EU) or self-hosted deployments may not reflect these changes immediately. Questions about your deployment? Contact your account team for specifics.
On a contract? Your account team will walk you through how these changes apply to your plan.
Getting Started
If You’re Already Building
- Review updated API Rate Limits documentation
- Test scaling scenarios with your new limits
If You’re Evaluating for Your Team
- Review Pricing page for updated plan-specific defaults
- Reach out to us if you need concurrency above new defaults
- On Pay as You Go? Growth Plan gets you up to 4.5x concurrency. Upgrade here.
Try It Now
Get your API key: Sign up for a Deepgram account and get $200 in free credits.
Your infrastructure scales with your success. We’ve increased default concurrency so you can focus on building voice AI that works, without waiting for permission to grow. That’s what building for the Voice AI economy looks like in practice.

