Customer Stories

Telnyx Powers Real-Time Voice AI at Carrier Scale with Deepgram Flux

Telnyx is a global communications and connectivity platform that operates its own carrier-grade network. To make voice AI feel truly real-time for developers and enterprises, Telnyx embedded Deepgram Flux directly into its media plane, treating speech as critical infrastructure rather than an external add-on. By running Flux on Telnyx-managed GPUs at its Points of Presence (PoPs) worldwide, Telnyx delivers ultra-low-latency voice experiences with natural turn taking that keeps pace with live human conversation.

Telnyx is a next-generation communications platform that powers secure, scalable voice, messaging, and AI-driven connectivity over a private global IP network.

Visit:

Telnyx

Key Results

Telnyx's embedded deployment of Deepgram Flux has enabled:

  • Ultra-low-latency, natural conversations with reliable barge-in and smooth turn-taking
  • Stable performance during regional call spikes, without throttling or degraded behavior
  • A simpler, more predictable operational model, with all speech processing kept inside the Telnyx network perimeter

The Challenge: Latency as the First Failure Mode

Most voice platforms still follow a similar pattern: terminate PSTN or SIP calls at a regional PoP, send audio over the public internet to third-party AI APIs, and wait on remote transcription before resuming the call. On paper, many of these APIs support streaming STT, but in practice Telnyx saw unpredictable latency, unstable partial transcripts, and brittle end-of-turn behavior, especially under load.

For Telnyx's customers, latency is the primary constraint. Once round-trip delay climbs beyond a few hundred milliseconds, conversations feel mechanical and "queued," regardless of word-level accuracy. Users notice jitter, delayed barge-in, and inconsistent behavior far more than small transcription errors.

On top of that, Telnyx needed to support noisy telephony audio, short utterances, numbers, and interruptions at carrier scale, without forcing audio off-net to external providers. The traditional "ship media to remote APIs" model was fundamentally misaligned with Telnyx's role as a carrier.

To solve this, Telnyx reframed speech as a physics problem. Rather than adding more buffering or heuristics around legacy APIs, the company chose to bring the AI as close to the media as possible and to treat speech-to-text as a first-class, real-time network service.

The Solution: Deepgram Flux Inside the Telnyx Media Plane

Telnyx selected Deepgram Flux as its primary real-time speech-to-text engine and deployed it on Telnyx-owned GPUs that are physically colocated with its telephony PoPs in key regions around the world. Instead of routing audio across the public internet, Telnyx runs Flux inside its own network perimeter, on the same low-latency fabric that carries call media.

When an inbound PSTN or SIP call lands on a Telnyx PoP, audio never leaves the Telnyx network. Media is streamed directly to Deepgram Flux running in the same region, where it is transcribed in real time. Flux provides both stable partial transcripts and final results, which Telnyx feeds into its call control and agent orchestration layer. That orchestration drives LLM reasoning and text-to-speech (TTS) responses, which are then streamed back to the caller over the same media path.

This architecture keeps telephony, STT, LLM, and TTS tightly synchronized. Because Deepgram Flux is running at the edge, inside the media plane, Telnyx can deliver sub-second end-to-end latency under load while preserving the reliability and observability of its carrier network.

Implementation in Practice

From a product and engineering perspective, Deepgram Flux is now treated like any other core Telnyx service:

On-net, real-time streaming — Inbound PSTN and SIP traffic terminates at a Telnyx PoP, and audio is streamed to Flux on Telnyx-managed GPUs in the same region. Flux is configured for telephony-grade audio, including 16 kHz input and common telephony codecs.

Turn-taking and orchestration — Flux's partial and final transcripts feed Telnyx's call control and agent orchestration layer. The system relies on accurate end-of-turn detection to know when a speaker is truly finished, enabling natural barge-in and rapid back-and-forth exchanges even with short utterances, overlaps, and background noise.

LLM and TTS in the same loop — Once a turn is complete, the orchestration layer calls into LLM-based reasoning and then into TTS, streaming synthesized responses back over the same media path. Because everything runs inside the Telnyx network, timing across STT, LLM, and TTS is tightly controlled.

Regional isolation and scale — Telnyx operates this pattern in multiple regions, aligning speech workloads with its global PoPs across North America, Europe, the Middle East, and Asia-Pacific. Each region can be scaled and monitored independently, matching the company's existing approach to core network services.

This approach allowed Telnyx to meet strict requirements around sub-second latency, high concurrency without rate shaping, real-world accuracy on telephony audio, and deployment control on Telnyx infrastructure via Deepgram's self-hosted solution.

Outcomes: More Natural Conversations, Simpler Operations

Since embedding Deepgram Flux in its media plane, Telnyx has seen a step-change in the quality and reliability of voice AI experiences.

For callers and end-users:

  • Conversations feel more responsive and natural, with smooth turn-taking, fewer awkward pauses, and a stronger sense of speaking to a live agent rather than a buffered system.
  • Partial transcripts are more stable, allowing Telnyx's orchestration layer to handle barge-in cleanly and avoid the "flapping" behavior that breaks natural dialogue.
  • In internal load tests simulating regional call spikes, STT performance remained stable without introducing noticeable delay conditions that previously would have forced throttling or degraded behavior.

For internal teams:

  • Engineering and SRE can track latency distributions, tail behavior, and error patterns for STT alongside other core network components.
  • Product teams benefit from clearer performance characteristics and fewer edge cases to design around.
  • GTM teams can confidently position Telnyx voice AI as carrier-grade, in-region, and powered by infrastructure Telnyx directly controls.

Privacy, Security, and What's Next

Because Deepgram Flux runs on Telnyx-owned GPUs inside Telnyx PoPs, audio never leaves the Telnyx network perimeter for transcription.

  • Data locality preserved by default, with workloads running in-region where calls originate
  • Encryption in transit and strict access controls across all media
  • Speech data handled under the same policies and auditability that underpin Telnyx's SOC 2, GDPR, and HIPAA-eligible posture

Looking ahead, Telnyx is continuing to expand this architecture:

  • More in-region GPU deployments to support additional geographies and higher concurrency
  • Broader language and dialect coverage for markets like the Middle East, Australia, and New Zealand
  • Deeper integration between telephony, STT (powered by Deepgram Flux), LLM reasoning, TTS, and enterprise systems such as CRMs and scheduling tools
  • Agent assistance and live coaching — using real-time transcription and call control to augment human agents during time-sensitive calls

For Telnyx, the takeaway is clear: real-time voice AI only works when it is architected like the rest of the network. By co-localizing Deepgram Flux with its media plane, Telnyx has turned speech from a distant API call into core infrastructure that powers the next generation of global, low-latency voice experiences.

Learn more about Deepgram's voice AI platform and Telnyx's voice AI agents.

Try Deepgram for free with our API Playground

Test your own audio files or quickly explore its capabilities with our pre-recordings. Try it now for a seamless audio API experience!