Article·Nov 14, 2025

How We Took Aura-2’s TTFB from <200ms to 90ms: Engineering Real-Time Voice AI at Scale

Instead of scaling hardware, we at Deepgram reengineered the runtime for parallelism and orchestration, achieving stable sub-200ms latency and, in steady-state conditions, around 90ms. Here are the details!

8 min read
Headshot of Adam Sypniewski

By Adam Sypniewski

CTO

Updated