Article·Oct 28, 2025

Accelerating Streaming STT Inference Through Custom Kernels

With Flux, we wanted to offer streaming STT suitable for voice agents that was low latency, without sacrificing accuracy or concurrency. This led us to a tricky problem in cache management. Here's our nifty solution! Warning: In-depth technical details ahead.

10 min read
Headshot of Josh Gevirtz

By Josh Gevirtz

Headshot of Jack Kearney

By Jack Kearney

Staff Research Scientist

Updated