We’re Tripling Default Concurrency to Power the Voice AI Economy

What’s New: Production-Grade Concurrency from Day One
Why This Matters for Voice AI Teams
Built for Teams Serving Multiple Customers
Transparent Upgrade Paths
Guaranteed Capacity from Day One
A Few Things to Know
Getting Started
If You’re Already Building
If You’re Evaluating for Your Team
Try It Now

Rate limit errors have a way of appearing at the worst possible moment: during a demo, right as your customer starts ramping up, or when your agent traffic finally starts scaling.

Nothing kills momentum like a 429 error when your voice agent should be handling 20 concurrent calls but your infrastructure is capped at 15. You’ve built something that works, your users want it, and then you hit a ceiling that has nothing to do with your code.

Today we’re raising that ceiling. With 1,300+ organizations powered by Deepgram, this infrastructure investment is part of our commitment to scaling the platform for the Voice AI economy.

What’s New: Production-Grade Concurrency from Day One

We're tripling default concurrency limits across Voice Agent API, Streaming STT, and TTS. Growth Plan customers get up to 4.5x.

New default concurrency limits:

API Product	Pay as You Go	Growth Plan

Voice Agent API (connections)	45 (was 15)	60 (was 15)
Streaming STT (streams)	150 (was 50)	225 (was 50)
WSS TTS (streams)	45 (was 15)	60 (was 15)

API Product

Voice Agent API (connections)

Pay as You Go

45 (was 15)

Growth Plan

60 (was 15)

These changes apply automatically today — no action needed on your end.

Why This Matters for Voice AI Teams

Here’s what the increase means for teams building on Deepgram. As Voice AI moves from pilot to production across enterprise teams, the infrastructure underneath has to keep up. These new defaults are part of Deepgram’s broader platform investments, the same foundation powering teams from startups to household name enterprises using AI at scale.

Built for Teams Serving Multiple Customers

If you’re building a conversational AI platform serving thousands of customers, or a meeting intelligence product processing high volumes for enterprise clients, the math just got 3x better.

Built to scale: With 45 WSS streams, 10 clients can now burst to 4–5 streams each, providing headroom for multi-turn conversations without a single customer’s traffic spike impacting other tenants. That’s the difference between one customer's spike taking everyone else down and a reliable production system.

Voice Agent stack: If you’re running STT, TTS, and Voice Agent API together, the new limits give all the room you need to scale. You can now run 45+ concurrent agents with headroom for traffic spikes.

What this means in practice:

Fewer HTTP 429 errors during integration and production scaling
More reliable user experience across your customer base and regions, no service failures during heavy growth or demand spikes
Scale without filing support tickets

Use cases that scale faster:

Conversational AI platforms scaling agents across multiple customers and regions
Meeting intelligence products processing high volumes for enterprise clients and multiple languages
Contact center analytics teams serving hundreds of locations or multiple regions
Healthcare, legal, and financial teams running high-throughput, multi-tenant workloads

Transparent Upgrade Paths

We publish our concurrency defaults by payment plan. You know exactly what you get, with no surprises and no support tickets to figure out your limits. As you consume more, concurrency automatically scales with your growth as you move to higher plans, with Enterprise offering the highest concurrency support. Additional capacity beyond your plan is available if you need it. Reach out to us for details.

Guaranteed Capacity from Day One

Some vendors market “unlimited” concurrency but implement dynamic scaling: 10% ramp-up periods every 60 seconds when you exceed 70% utilization. During a traffic spike, that’s a 25-minute wait to scale from 100 to 1,000 streams, assuming perfect ramp-up conditions. Your application waits while their infrastructure catches up, which essentially means your customers pay the price when you're trying to deliver sub-second response times.

Other vendors start concurrency so low that builders are forced into spend-based tier advancement and manual approvals just to reach production-grade limits.

Deepgram starts with high, guaranteed floors, all available immediately. The infrastructure is pre-provisioned for scale, so you can move from prototype to production without waiting for permission.

Your critical voice infrastructure keeps up with your growth.

A Few Things to Know

These are guaranteed floors, with transparent limits. You know exactly what capacity you have from day one.

This is a permanent platform enhancement, built into your plan going forward.

Some deployments may take longer. Regional (EU) or self-hosted deployments may not reflect these changes immediately. Questions about your deployment? Contact your account team for specifics.

On a contract? Your account team will walk you through how these changes apply to your plan.

Getting Started

If You’re Already Building

Review updated API Rate Limits documentation
Test scaling scenarios with your new limits

If You’re Evaluating for Your Team

Review Pricing page for updated plan-specific defaults
Reach out to us if you need concurrency above new defaults
On Pay as You Go? Growth Plan gets you up to 4.5x concurrency. Upgrade here.

Try It Now

Get your API key: Sign up for a Deepgram account and get $200 in free credits.

Your infrastructure scales with your success. We’ve increased default concurrency so you can focus on building voice AI that works, without waiting for permission to grow. That’s what building for the Voice AI economy looks like in practice.


Voice Agent API (connections)	45 (was 15)	60 (was 15)
Streaming STT (streams)	150 (was 50)	225 (was 50)
WSS TTS (streams)	45 (was 15)	60 (was 15)

API Product

Pay as You Go

Growth Plan

Voice Agent API (connections)

45 (was 15)

60 (was 15)

Streaming STT (streams)

150 (was 50)

225 (was 50)

WSS TTS (streams)

45 (was 15)

60 (was 15)

We’re Tripling Default Concurrency to Power the Voice AI Economy

Table of Contents

Table of Contents

What’s New: Production-Grade Concurrency from Day One

Why This Matters for Voice AI Teams

Built for Teams Serving Multiple Customers

Transparent Upgrade Paths

Guaranteed Capacity from Day One

A Few Things to Know

Getting Started

If You’re Already Building

If You’re Evaluating for Your Team

Try It Now

Table of Contents

Table of Contents

What’s New: Production-Grade Concurrency from Day One

Why This Matters for Voice AI Teams

Built for Teams Serving Multiple Customers

Transparent Upgrade Paths

Guaranteed Capacity from Day One

A Few Things to Know

Getting Started

If You’re Already Building

If You’re Evaluating for Your Team

Try It Now