Is there a way to make GPT5 think for less time LiveKit Community #ask-ai

Join Slack

Is there a way to make GPT5 think for less time?

# ask-ai

red-dentist-96118

08/18/2025, 5:55 PM

Is there a way to make GPT5 think for less time?

tall-airline-16756

08/18/2025, 5:55 PM

Thanks for reaching out, Trevor. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/18/2025, 5:55 PM

Could you help me understand a bit better—are you asking about speeding up GPT-5’s response time specifically within a LiveKit app, or is it something else? If it’s about LiveKit, I’d love to know more about how you’ve got things set up!

red-dentist-96118

08/18/2025, 5:56 PM

Currently, GPT-5 thinks for too long for a responsive conversation, but I think OpenAI exposes a faster endpoint / possibly a way to limit how long it thinks for

tall-airline-16756

08/18/2025, 5:58 PM

Hey there! Just wanted to let you know that LiveKit works great with OpenAI models like GPT-4 and GPT-3.5, though we don’t have any word yet on GPT-5 support. If you’re aiming to speed things up, here are some handy tips: • Use the OpenAI Realtime API: It’s built for lightning-fast AI responses (around 300ms). • Pick Faster Models: Models like GPT-3.5 tend to be quicker than the more advanced ones. • Streaming Optimizations: Try processing responses bit by bit, use smaller audio frame sizes (like 20ms instead of 100ms), and encourage the model to keep answers concise. • Make the Most of Your Setup: Take advantage of WebRTC for quick audio transfer, global edge network routing, and built-in load balancing. If you have access to OpenAI's endpoints, it’s worth checking out their docs for special parameters that can help limit response time or speed things up even more. Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py | Live Conversations with AI using ChatGPT and WebRTC | OpenAI and LiveKit partner to turn Advanced Voice into an API

red-dentist-96118

08/18/2025, 6:03 PM

What does _max_completion_tokens do?_

tall-airline-16756

08/18/2025, 6:04 PM

The

max_completion_tokens

parameter in the OpenAI API lets you set a limit on how many tokens the model can generate in its response. Tokens are basically chunks of text—like words or parts of words. By tweaking this setting, you can easily control how long the model’s replies are, which helps avoid responses that are way too long. When you’re working with LiveKit, you can set this parameter either when you’re initializing the language model or right in the chat method, so you have the flexibility to keep responses just the right size.

red-dentist-96118

08/18/2025, 6:17 PM

can I do this just for thinking?

tall-airline-16756

08/18/2025, 6:17 PM

Hey there! So, the

max_completion_tokens

parameter basically sets the total length of the output, covering both the thinking and content creation parts. But just a heads-up—it doesn’t specifically limit the thinking time alone. Some models might have something like a

thinking_budget

parameter to control how long they spend thinking, but that’s pretty model-specific and isn’t something you’ll find directly in the OpenAI API. If you want to limit thinking time specifically, you might need to dig into model-specific settings or try out other approaches. Hope that helps! Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py | Adjusting Model Parameters | LiveKit Docs | tools/server/README.md

red-dentist-96118

08/18/2025, 6:22 PM

how do I adjust the params that get sent to openAI? I want to use things like reasoning_effort and verbosity

tall-airline-16756

08/18/2025, 6:23 PM

Hey there! Right now, it looks like parameters like

reasoning_effort

and

verbosity

aren’t directly adjustable through OpenAI API requests via LiveKit. They don’t seem to be listed as configurable options in the docs. If you're hoping to tweak how the model behaves, there might be other parameters or settings we can explore together. Just let me know if you want a hand with that or if you have any questions—I’m here to help!

red-dentist-96118

08/18/2025, 6:23 PM

this is part of the body that would get sent to openai if I made the curl requests myself, can I not adjust these myself?

tall-airline-16756

08/18/2025, 6:24 PM

Hey there! Just a quick heads-up: the documentation doesn’t clearly say whether you can tweak parameters like

reasoning_effort

and

verbosity

directly within LiveKit's OpenAI integration. If you're sending API requests straight to OpenAI, you might be able to include those parameters in the request body, but this isn’t something that’s officially covered in LiveKit’s docs. If you’re hoping to customize the OpenAI API request body while using LiveKit, just give me a shout! I’d be happy to help out based on what you’re working with or what you need.

red-dentist-96118

08/18/2025, 6:24 PM

If you’re hoping to customize the OpenAI API request body while using LiveKit, just give me a shout! I’d be happy to help out based on what you’re working with or what you need.

red-dentist-96118

08/18/2025, 6:24 PM

This is what I want to do

tall-airline-16756

08/18/2025, 6:25 PM

Hey there! Just a quick heads-up: the docs don’t specifically mention that you can tweak parameters like

reasoning_effort

and

verbosity

directly within LiveKit’s OpenAI integration. If you’re sending API requests straight to OpenAI, you might be able to add these parameters in the request body, but this isn’t clearly outlined as a LiveKit feature. If you’re hoping to customize the OpenAI API request body inside LiveKit, just give me a shout! I’d be happy to help out based on what you’re working with or what you need. Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real...

Open in Slack

Previous Next