Table of Contents
TL;DR: Deepgram Flux Multilingual (flux-general-multi) provides real-time streaming speech-to-text for 10 languages withlanguage_hintbiasing, automatic language detection viaTurnInfo, and native code-switching. One model, one WebSocket connection.
If your real-time speech-to-text pipeline handles more than one language, you've probably built this stack: a language detection service, per-language models, and routing logic to connect them. Unlike approaches that require a separate model per language with external routing, Deepgram Flux Multilingual (flux-general-multi) handles all ten languages in a single streaming connection with automatic language detection, native code-switching, and a new language_hint parameter that biases detection toward languages you expect. If you're new to Deepgram, Flux is our real-time conversational STT model built for voice agents. It comes with turn detection, interruption handling, and barge-in awareness out of the box. Grab a free API key and follow along.
Step 1: Connect to Deepgram Flux Multilingual
If you're already using the Deepgram Python SDK with Flux, this is a one-line change: swap flux-general-en for flux-general-multi in your model parameter.
If you're starting fresh, install the SDK (pip install deepgram-sdk) and connect:
from deepgram import DeepgramClient
from deepgram.core.events import EventType
client = DeepgramClient("your-api-key") # or set DEEPGRAM_API_KEY env var
# Connect to Flux Multilingual — same API, new model
connection = client.listen.v2.connect(
model="flux-general-multi",
encoding="linear16",
sample_rate=16000,
)That's the same client.listen.v2.connect() you already use for Flux. The model name is the only change. Turn detection, interruption handling, barge-in all carry over from Flux.
If you prefer working with the WebSocket directly (or you're using a language without an official SDK), the raw endpoint works too:
wss://api.deepgram.com/v2/listen?model=flux-general-multi&encoding=linear16&sample_rate=16000All the code samples below use the Python SDK. The JavaScript and Java SDKs support the same multilingual streaming features as the Python SDK—same language_hints configuration and the same TurnInfo.languages / TurnInfo.languages_hinted outputs.
Step 2: Choose your language_hint strategy
This is a design decision that matters most in your integration. language_hint is a new optional parameter. It is an array of BCP-47 language codes that biases the model toward the languages you expect. Think of it as a prior, not a hard constraint. It narrows the model's hypothesis space without locking it in. If someone speaks a language outside your hint set, the model can still detect it.
The right strategy depends on what your application knows when the connection opens.
If you know the language meaning the caller selected it, or this is a single-language queue:
connection.send_configure(language_hints=["es"])Single-hint accuracy is close to a dedicated monolingual model. This is the highest-accuracy configuration you can set.
If you know the language candidate: for example, a support desk handling English, Spanish, and French:
connection.send_configure(language_hints=["en", "es", "fr"])The model narrows its search space to your expected set but can switch between them turn-by-turn as the caller does.
If you don't know: for example, a globally available, unpredictable caller base:
Automatic language detection works across all ten supported languages. Omit language_hint entirely and Flux Multilingual auto-detects. You trade a small accuracy margin for zero configuration.
You expect code-switching: bilingual callers mixing languages in the same sentence:
connection.send_configure(language_hints=["en", "es"])This is the scenario that usually breaks per-language routing entirely. A caller says "I need help with my cuenta". Those are two languages in one utterance. Flux Multi handles this natively because it's one multilingual model, not separate models stitched together. There's no routing decision to get wrong.
A good rule of thumb: if your application has signal about likely languages at connection time (user selection, queue config, account locale), hint them. If it doesn't, omit hints and lean on per-turn detection.
Step 3: Read per-turn language detection from TurnInfo
This is where the architecture really simplifies. Instead of running a separate language detection service before your real-time STT pipeline, Deepgram Flux Multilingual tells you what language it detected on every turn as a first-class field in the transcription output.
Every TurnInfo event now includes two new fields:
languages— BCP-47 codes for all languages detected in that turn, sorted by how much of the turn was in each language (primary language first). When there's no transcript, this is empty.languages_hinted— the hint set that was active when this turn was processed. This is useful for debugging when behavior isn't what you expected.
Here's what it looks like when you wire it up:
def on_turn_info(event):
if event.event == "EndOfTurn" and event.languages:
primary_lang = event.languages[0]
print(f"[{primary_lang}] {event.transcript}")
# Use the detected language to drive downstream decisions
if primary_lang == "es":
switch_tts_voice("es")
update_llm_prompt(language="es")
connection.on(EventType.TURN_INFO, on_turn_info)The raw TurnInfo JSON looks like this if you're working at the WebSocket level:
{
"type": "TurnInfo",
"event": "EndOfTurn",
"transcript": "I need help with my cuenta",
"languages_hinted": ["en", "es"],
"languages": ["en", "es"],
"words": [
{"word": "I", "confidence": 0.98},
{"word": "need", "confidence": 0.96},
{"word": "help", "confidence": 0.97},
{"word": "with", "confidence": 0.95},
{"word": "my", "confidence": 0.97},
{"word": "cuenta", "confidence": 0.93}
],
"end_of_turn_confidence": 0.86
}languages[0] is the primary detected language, which is the one spoken most in that turn. You can use it to switch a TTS voice, update an LLM system prompt, or route to a language-specific agent queue.
One thing to watch for in production: short utterances and noisy audio can sometimes produce one-off language detections. If you're making consequential decisions based on language (switching an agent, changing a voice), it's worth requiring the same languages[0] for 2-3 consecutive turns before committing. That gives you stability without losing the ability to react when a caller genuinely switches.
Step 4: Reconfigure language_hint mid-stream
Language hints aren't a one-time setting. If your application learns something after the stream opens, you can update hints without closing the connection:
# Caller started in English, now clearly speaking French
connection.send_configure(language_hints=["fr"])This is really useful for scenarios like a caller selecting a language in an IVR menu, or having the first few turns stabilize on a language you didn't expect. A non-empty array replaces the current hints. An empty array [] clears them and returns to full auto-detect. Omitting the field leaves hints unchanged, which is useful when you're adjusting other streaming parameters (like end-of-turn thresholds) without touching language settings.
If a reconfiguration fails, you'll get a ConfigureFailure event. Because it's non-fatal, your stream keeps running. You can log it and decide whether to retry or fall back to a default configuration.
Step 5: Handle the two language_hint errors that fail silently
There are two specific errors to design for that return 400 responses.
Unsupported language code. The initial release supports ten languages. If you pass a code outside that set, you'll get a 400 error.
language_hint on the wrong model. Hints only work with flux-general-multi. If a hint configuration gets applied to a flux-general-en connection, it returns a 400:
{
"code": "INVALID_PARAMETER",
"description": "language_hint is not supported for model flux-general-en"
}Both are straightforward to handle. Surface them in your logs and monitoring so they don't silently degrade your multilingual experience.
What if you're migrating from a multi-model setup?
If you're running the detection-then-routing architecture described at the top, there is a migration for you. The detection service can be replaced by TurnInfo.languages, the per-language models can be replaced by flux-general-multi, and the routing logic simplifies to reading a field from the transcription events you're already processing.
One thing that doesn't change: flux-general-en is still available. If your application is English-only, keep using it.
Start building with Deepgram Flux Multilingual
Sign up for a free Deepgram API key, set your model to flux-general-multi, and start streaming.
→ API reference — language_hint and TurnInfo fields
→ Python SDK | JS SDK | Java SDK








