late-businessperson-58140
08/19/2025, 5:50 AMtall-airline-16756
08/19/2025, 5:50 AMtall-airline-16756
08/19/2025, 5:50 AMlate-businessperson-58140
08/19/2025, 6:00 AMtall-airline-16756
08/19/2025, 6:00 AMLLM.with_azure()
method doesn’t recognize a parameter called max_tokens
. No worries though—you’ll want to use max_response_output_tokens
instead to set the max number of tokens in the response.
Here’s a quick fix for your code:
from livekit.plugins.openai import LLM
azure_llm = LLM.with_azure(
model="gpt-4o",
temperature=0.8,
max_response_output_tokens=1000 # Use this instead of max_tokens
)
Also, don’t forget to include other important parameters like azure_deployment
, azure_endpoint
, api_version
, and your authentication details (api_key
or entra_token
) if you need them.
Sources: LLM | LiveKit Agents | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py | Azure integration guide | LiveKit Docs | Azure OpenAI LLM integration guide | LiveKit Docslate-businessperson-58140
08/19/2025, 6:04 AMtall-airline-16756
08/19/2025, 6:05 AMLLM.with_azure()
method doesn’t support the parameter max_response_output_tokens
. No worries though—you can use max_completion_tokens
instead to control how long the response is when you're working with the standard LLM
class.
Here’s a little tweak to your code:
from livekit.plugins.openai import LLM
azure_llm = LLM.with_azure(
model="gpt-4o",
temperature=0.8,
max_completion_tokens=1000 # Use this instead of max_response_output_tokens
)
And if you’re dealing with real-time audio and really need max_response_output_tokens
, then the way to go is using RealtimeModel.with_azure()
instead of LLM.with_azure()
. Hope that helps!
Sources: LiveKit Docs | plugins/openai/src/llm.ts | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py