Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B

Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B
GPT-5 Support Across All Tiers
GPT-OSS-20B Support
Trying Both in the Playground
Model Comparison
Wrap-Up

Share this guide

By Hasan Jilani

Director of Product Marketing

Last Updated

Aug 13, 2025

Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B

Last week, we shared our perspective on GPT-5 and the future of voice AI, looking at what its improvements in reasoning, context handling, and adaptability could mean for real-time voice applications. Now you can try it for yourself.

As of this week, GPT-5 (OpenAI) and GPT-OSS-20B (OpenAI) are both available in the Deepgram Voice Agent API and in the Deepgram Playground. That means you can benchmark them with your own prompts, hear how they respond in your application’s domain, and decide whether the higher reasoning of GPT-5, the speed of GPT-5-mini, the cost efficiency of GPT-5-nano, or the flexibility of GPT-OSS-20B is the right fit for your production stack.

This post walks you through the capabilities of each model, why they matter for voice-first developers, and exactly how to test and deploy them.

GPT-5 Support Across All Tiers

GPT-5 is available in three tiers:

gpt-5 – Full reasoning and context handling
gpt-5-mini – Balanced performance for speed and accuracy
gpt-5-nano – Lightweight, cost-efficient option for fast responses

Why GPT-5 Matters for Voice Agents

LLM upgrades are not just about more parameters or higher benchmark scores. GPT-5 brings practical improvements that are especially important for voice-first applications:

Better context retention - Handles longer, multi-turn conversations without forgetting earlier details, reducing the need for the user to repeat themselves.
Faster, more reliable reasoning - Produces better multi-step instructions and fewer dead ends. For example, if a caller changes their order mid-stream, GPT-5 can adjust without starting over.
Improved ambiguity resolution - More accurately interprets unclear requests and can ask clarifying questions, which is crucial for real-time intent resolution.

For developers building on the Voice Agent API, this means less prompt engineering overhead and higher success rates for real-world tasks.

How to Use GPT-5

{
  "type": "Settings",
  "agent": {
    "think": {
      "provider": {
        "type": "open_ai",
        "model": "gpt-5"
      }
    }
  }
}

Replace "gpt-5" with "gpt-5-mini" or "gpt-5-nano" as needed.

Ready to try it yourself? Sign up for the Deepgram Console and start building with the Voice Agent API today.

GPT-OSS-20B Support

We have also added support for GPT-OSS-20B, OpenAI’s first open-source LLM, in the Voice Agent API. This gives you a fully open, large-scale model option for your agents.

Why GPT-OSS-20B Matters for Voice Agents

Open weights – Full transparency for experimentation, self-hosting, and tuning
20B parameters – Strong enough for multi-turn reasoning and complex task flows
Groq hosting – Optimized inference performance to keep latency low

For developers, GPT-OSS-20B provides open-source flexibility with performance that is viable for many production-grade voice applications, especially where model transparency or customization is a requirement.

How to Use GPT-OSS-20B

{
  "type": "Settings",
  "agent": {
    "think": {
      "provider": {
        "type": "groq",
        "model": "openai/gpt-oss-20b"
      }
    }
  }
}

Use "openai/gpt-oss-20b" to run GPT-OSS-20B on Groq-hosted infrastructure.

Trying Both in the Playground

You can test GPT-5 or GPT-OSS-20B instantly in the Deepgram Playground without changing your production configuration. This makes it ideal for side-by-side benchmarking before committing to a model.

Screen recording of a user selecting GPT-5 or GPT-OSS-20B from the model dropdown in Deepgram’s Playground.

Steps to Test:

Open the Deepgram Playground and select the Voice Agent API example.
Choose OpenAI (for GPT-5, GPT-5-mini, GPT-5-nano) or Groq (for GPT-OSS-20B) as the LLM provider.
Select the model you want to test from the dropdown.
Provide a sample prompt or start a live voice session.
Monitor both response latency and quality of reasoning in the output panel.

What to Compare:

Reasoning depth – How well does the model handle multi-step or ambiguous requests?
Context retention – Can it maintain accuracy across a long back-and-forth conversation?
Latency – Measure time-to-first-token and total response time.
Cost – Keep an eye on token usage for the same interaction.
Error recovery – Does the model gracefully handle interruptions or malformed requests?

Tips for Better Evaluation:

Use the same test script or voice scenario for each model to keep comparisons fair.
Try both short commands (“Schedule a meeting for 2 PM”) and hypothetical complex requests (“If I were to book a meeting and send invites, what steps would you take?”) to hear how the model structures responses, even though the Playground will not execute the workflow.
For latency-sensitive use cases, track round-trip response time from speech input to audible reply.
If you are considering GPT-OSS-20B, experiment with custom system prompts to take advantage of its open weights for domain-specific tuning.

From test to production is just one step. Create your Deepgram Console account and deploy your chosen model instantly.

Model Comparison

Model	Best For	Key Strengths	Trade-offs
gpt-5	Complex, high-context conversations	Strongest reasoning and context retention	Higher latency and cost compared to smaller models
gpt-5-mini	Most production use cases	Balanced accuracy and response speed	Slightly less reasoning depth than full GPT-5
gpt-5-nano	Low-latency or budget-focused apps	Fastest response, most cost-efficient	Reduced context window and reasoning complexity
gpt-oss-20b	Open-source flexibility	Transparent, tunable, strong general-purpose performance	Less advanced reasoning than GPT-5; hosted on Groq

Wrap-Up

With GPT-5 and GPT-OSS-20B now available in the Voice Agent API, you can match model capabilities more precisely to your application’s needs. Whether you want the reasoning depth of GPT-5, the balanced speed of GPT-5-mini, the low-latency performance of GPT-5-nano, or the transparency of GPT-OSS-20B, you can try them all in the Playground, benchmark them side-by-side, and deploy instantly to production.

The quickest way to understand how these models will impact your voice agent is to run your own scenarios, listen to the differences, measure the response times, and see how they handle your domain-specific prompts. Every insight you gain now can directly improve your success rate when serving real users.

Start exploring today:

Open the Playground → choose a model → start talking
Sign up for the Console
Voice Agent API Docs
Voice Agent LLM Models Reference
GPT-5 and the Future of Voice AI

Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B

Table of Contents

Table of Contents

Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B

GPT-5 Support Across All Tiers

Why GPT-5 Matters for Voice Agents

How to Use GPT-5

GPT-OSS-20B Support

Why GPT-OSS-20B Matters for Voice Agents

How to Use GPT-OSS-20B

Trying Both in the Playground

Model Comparison

Wrap-Up

Unlock language AI at scale with an API call.

Unlock language AI at scale with an API call.