Article·Announcements·Aug 13, 2025
5 min read

Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B

GPT-5 and GPT-OSS-20B are now live in the Deepgram Voice Agent API and Playground, giving developers more choice for reasoning depth, latency, cost efficiency, and open-source flexibility. Test them side-by-side, benchmark in your domain, and deploy instantly to production.
5 min read
Hasan Jilani
By Hasan JilaniDirector of Product Marketing
Last Updated

Voice Agent API Just Leveled Up: GPT-5 + GPT-OSS-20B

Last week, we shared our perspective on GPT-5 and the future of voice AI, looking at what its improvements in reasoning, context handling, and adaptability could mean for real-time voice applications. Now you can try it for yourself.

As of this week, GPT-5 (OpenAI) and GPT-OSS-20B (OpenAI) are both available in the Deepgram Voice Agent API and in the Deepgram Playground. That means you can benchmark them with your own prompts, hear how they respond in your application’s domain, and decide whether the higher reasoning of GPT-5, the speed of GPT-5-mini, the cost efficiency of GPT-5-nano, or the flexibility of GPT-OSS-20B is the right fit for your production stack.

This post walks you through the capabilities of each model, why they matter for voice-first developers, and exactly how to test and deploy them.

GPT-5 Support Across All Tiers

GPT-5 is available in three tiers:

  • gpt-5 – Full reasoning and context handling

  • gpt-5-mini – Balanced performance for speed and accuracy

  • gpt-5-nano – Lightweight, cost-efficient option for fast responses

Why GPT-5 Matters for Voice Agents

LLM upgrades are not just about more parameters or higher benchmark scores. GPT-5 brings practical improvements that are especially important for voice-first applications:

  • Better context retention - Handles longer, multi-turn conversations without forgetting earlier details, reducing the need for the user to repeat themselves.

  • Faster, more reliable reasoning - Produces better multi-step instructions and fewer dead ends. For example, if a caller changes their order mid-stream, GPT-5 can adjust without starting over.

  • Improved ambiguity resolution - More accurately interprets unclear requests and can ask clarifying questions, which is crucial for real-time intent resolution.

For developers building on the Voice Agent API, this means less prompt engineering overhead and higher success rates for real-world tasks.

How to Use GPT-5

Replace "gpt-5" with "gpt-5-mini" or "gpt-5-nano" as needed.

Ready to try it yourself? Sign up for the Deepgram Console and start building with the Voice Agent API today.

GPT-OSS-20B Support

We have also added support for GPT-OSS-20B, OpenAI’s first open-source LLM, in the Voice Agent API. This gives you a fully open, large-scale model option for your agents.

Why GPT-OSS-20B Matters for Voice Agents

  • Open weights – Full transparency for experimentation, self-hosting, and tuning

  • 20B parameters – Strong enough for multi-turn reasoning and complex task flows

  • Groq hosting – Optimized inference performance to keep latency low

For developers, GPT-OSS-20B provides open-source flexibility with performance that is viable for many production-grade voice applications, especially where model transparency or customization is a requirement.

How to Use GPT-OSS-20B

Use "openai/gpt-oss-20b" to run GPT-OSS-20B on Groq-hosted infrastructure.

Trying Both in the Playground

You can test GPT-5 or GPT-OSS-20B instantly in the Deepgram Playground without changing your production configuration. This makes it ideal for side-by-side benchmarking before committing to a model.

Screen recording of a user selecting GPT-5 or GPT-OSS-20B from the model dropdown in Deepgram’s Playground.
Screen recording of a user selecting GPT-5 or GPT-OSS-20B from the model dropdown in Deepgram’s Playground.

Steps to Test:

  1. Open the Deepgram Playground and select the Voice Agent API example.

  2. Choose OpenAI (for GPT-5, GPT-5-mini, GPT-5-nano) or Groq (for GPT-OSS-20B) as the LLM provider.

  3. Select the model you want to test from the dropdown.

  4. Provide a sample prompt or start a live voice session.

  5. Monitor both response latency and quality of reasoning in the output panel.

What to Compare:

  • Reasoning depth – How well does the model handle multi-step or ambiguous requests?

  • Context retention – Can it maintain accuracy across a long back-and-forth conversation?

  • Latency – Measure time-to-first-token and total response time.

  • Cost – Keep an eye on token usage for the same interaction.

  • Error recovery – Does the model gracefully handle interruptions or malformed requests?

Tips for Better Evaluation:

  • Use the same test script or voice scenario for each model to keep comparisons fair.

  • Try both short commands (“Schedule a meeting for 2 PM”) and hypothetical complex requests (“If I were to book a meeting and send invites, what steps would you take?”) to hear how the model structures responses, even though the Playground will not execute the workflow.

  • For latency-sensitive use cases, track round-trip response time from speech input to audible reply.

  • If you are considering GPT-OSS-20B, experiment with custom system prompts to take advantage of its open weights for domain-specific tuning.

From test to production is just one step. Create your Deepgram Console account and deploy your chosen model instantly.

Model Comparison

Wrap-Up

With GPT-5 and GPT-OSS-20B now available in the Voice Agent API, you can match model capabilities more precisely to your application’s needs. Whether you want the reasoning depth of GPT-5, the balanced speed of GPT-5-mini, the low-latency performance of GPT-5-nano, or the transparency of GPT-OSS-20B, you can try them all in the Playground, benchmark them side-by-side, and deploy instantly to production.

The quickest way to understand how these models will impact your voice agent is to run your own scenarios, listen to the differences, measure the response times, and see how they handle your domain-specific prompts. Every insight you gain now can directly improve your success rate when serving real users.

Start exploring today:

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.