Browser Agent SDK: Ship a Voice Agent on Any Web App in Minutes

Listen to article03:54

Four Layers, One Install Away
Ship in Minutes
What You Can Build
Production-Grade Audio by Default
Ship in Minutes, Customize for Months

Listen to article03:54

The web is now a primary surface for voice agents. Support portals, scheduling tools, internal copilots, marketing sites, in-product help. And yet, getting a voice agent onto a web page has been the slowest part of building one.

You spin up an agent on the Voice Agent in an afternoon. Then you spend a sprint on the browser side: mic capture, audio worklets, playback queues, reconnection, KeepAlive, token rotation without leaking your API key, and a UI that doesn't look like a 2014 chat widget. By the time the agent speaks on the page, you've shipped a small audio pipeline you didn't really want to own.

Today we're shipping the Browser Agent SDK: four composable npm packages that drop a Deepgram voice agent into any web app, with a clean path from one-line widget to full framework-agnostic control.

Four Layers, One Install Away

Each layer builds on the one below it. Install the highest layer you need and the rest comes with it.

@deepgram/agents-widget: drop-in widget with six layouts (sidebar, floating, inline, button, embedded, or orb). No framework required.
@deepgram/ui: pre-built React components (conversation view, animated orb, mic and speaker controls, waveform visualizer). Themed through CSS custom properties so your design system stays in charge.
@deepgram/react: AgentProvider and hooks for state, conversation history, microphone control, audio playback, and client-side function calling scoped to component lifecycle.
@deepgram/agents: the framework-agnostic core. WebSocket client, microphone capture, and player. Use it with Vue, Svelte, Angular, or vanilla JavaScript.

All four layers share the same reconnection logic, playback-aware mode tracking, audio buffering, optional Silero VAD, KeepAlive pings, and typed event emitter. The hard parts of browser voice are handled once and inherited everywhere.

Ship in Minutes

npm install @deepgram/agents-widget

import { init } from "@deepgram/agents-widget";

init({
  tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
  agent: "YOUR_AGENT_ID",
  layout: "floating",
});

That's the whole client. Your server endpoint mints a short-lived token using the Deepgram auth grant, the SDK calls it on every connect and reconnect, and your API key never touches the browser. The token rides the Sec-WebSocket-Protocol header because that's the only custom header browsers permit on WebSocket handshakes.

Reference a pre-configured Agent ID from Deepgram Console, define listen / think / speak inline in code, or combine both. Console changes propagate to the agent without a redeploy.

What You Can Build

Drop a voice helper into a marketing or docs site. Use the widget in floating or button layout. Five-minute setup, no React required.

Embed a voice copilot in a React product. Use @deepgram/react for state and the @deepgram/ui components for the UI. Theme it with your existing CSS variables. Scope function calls to the component lifecycle so they're cleaned up when the user navigates away.

Build a custom voice experience in Vue, Svelte, or vanilla JS. Use @deepgram/agents directly. You get AgentSession, AgentMicrophone, and AgentPlayer and bring your own UI. Same reconnection and buffering as every other layer.

Production-Grade Audio by Default

Most of the bugs in browser voice come from the same places: a reconnection storm that ddos's your token endpoint, audio frames dropped before the server is ready, the agent "thinking it finished talking" while the user is still hearing tail audio, idle WebSockets that get killed by a proxy after 60 seconds.

The Browser Agent SDK handles these out of the box. Reconnection uses exponential backoff with jitter and configurable ceilings. Microphone frames captured before the server's SettingsApplied are queued and flushed. The SDK switches from speaking to listening only after the audio queue actually drains in the browser, so the agent does not interrupt its own tail audio. KeepAlive heartbeats prevent idle disconnects.

Visualizations (orb and waveform) use Canvas 2D, not WebGL, so they work on low-power devices without a GPU.

Ship in Minutes, Customize for Months

You can start with the widget today and graduate to the React layer when you need custom UI, or drop to the framework-agnostic core when you outgrow React. You don't have to change vendors as you grow. Same connection logic, same audio defaults, same agent.

Listen to article03:54

Four Layers, One Install Away
Ship in Minutes
What You Can Build
Production-Grade Audio by Default
Ship in Minutes, Customize for Months

Listen to article03:54

Today we're shipping the Browser Agent SDK: four composable npm packages that drop a Deepgram voice agent into any web app, with a clean path from one-line widget to full framework-agnostic control.

Four Layers, One Install Away

Each layer builds on the one below it. Install the highest layer you need and the rest comes with it.

@deepgram/agents-widget: drop-in widget with six layouts (sidebar, floating, inline, button, embedded, or orb). No framework required.
@deepgram/ui: pre-built React components (conversation view, animated orb, mic and speaker controls, waveform visualizer). Themed through CSS custom properties so your design system stays in charge.
@deepgram/react: AgentProvider and hooks for state, conversation history, microphone control, audio playback, and client-side function calling scoped to component lifecycle.
@deepgram/agents: the framework-agnostic core. WebSocket client, microphone capture, and player. Use it with Vue, Svelte, Angular, or vanilla JavaScript.

Ship in Minutes

npm install @deepgram/agents-widget

import { init } from "@deepgram/agents-widget";

init({
  tokenFactory: () => fetch("/api/deepgram-token").then((r) => r.text()),
  agent: "YOUR_AGENT_ID",
  layout: "floating",
});

Reference a pre-configured Agent ID from Deepgram Console, define listen / think / speak inline in code, or combine both. Console changes propagate to the agent without a redeploy.

What You Can Build

Drop a voice helper into a marketing or docs site. Use the widget in floating or button layout. Five-minute setup, no React required.

Production-Grade Audio by Default

Visualizations (orb and waveform) use Canvas 2D, not WebGL, so they work on low-power devices without a GPU.

Put a Deepgram Voice Agent on Any Web App in Minutes

Table of Contents

Table of Contents

Four Layers, One Install Away

Ship in Minutes

What You Can Build

Production-Grade Audio by Default

Ship in Minutes, Customize for Months

You may also like...

Unlock voice AI at scale with an API Call

Unlock voice AI at scale with an API Call

Table of Contents

Table of Contents

Four Layers, One Install Away

Ship in Minutes

What You Can Build

Production-Grade Audio by Default

Ship in Minutes, Customize for Months

You may also like...

Unlock voice AI at scale with an API Call

Unlock voice AI at scale with an API Call