hi LiveKit Community #ask-ai

Join Slack

# ask-ai

late-businessperson-58140

08/19/2025, 5:50 AM

tall-airline-16756

08/19/2025, 5:50 AM

Thanks for reaching out, Pavan. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/19/2025, 5:50 AM

Hey there! What can I help you with today?

late-businessperson-58140

08/19/2025, 6:00 AM

Traceback (most recent call last): File "C:\Codeshastra_Projects\multilingual-health-assistant-backend\main.py", line 97, in <module> azure_llm = LLM.with_azure( ^^^^^^^^^^^^^^^ TypeError: LLM.with_azure() got an unexpected keyword argument 'max_tokens' (venv) C:\Codeshastra_Projects\multilingual-health-assistant-backend>

tall-airline-16756

08/19/2025, 6:00 AM

Hey there! The error you're seeing happens because the

LLM.with_azure()

method doesn’t recognize a parameter called

max_tokens

. No worries though—you’ll want to use

max_response_output_tokens

instead to set the max number of tokens in the response. Here’s a quick fix for your code:

Copy code

from livekit.plugins.openai import LLM
azure_llm = LLM.with_azure(
    model="gpt-4o",
    temperature=0.8,
    max_response_output_tokens=1000  # Use this instead of max_tokens
)

Also, don’t forget to include other important parameters like

azure_deployment

azure_endpoint

api_version

, and your authentication details (

api_key

entra_token

late-businessperson-58140

08/19/2025, 6:04 AM

azure_llm = LLM.with_azure( ^^^^^^^^^^^^^^^ TypeError: LLM.with_azure() got an unexpected keyword argument 'max_response_output_tokens'

tall-airline-16756

08/19/2025, 6:05 AM

Hey there! Just a quick heads-up: the

LLM.with_azure()

method doesn’t support the parameter

max_response_output_tokens

. No worries though—you can use

max_completion_tokens

instead to control how long the response is when you're working with the standard

LLM

class. Here’s a little tweak to your code:

Copy code

from livekit.plugins.openai import LLM
azure_llm = LLM.with_azure(
    model="gpt-4o",
    temperature=0.8,
    max_completion_tokens=1000  # Use this instead of max_response_output_tokens
)

And if you’re dealing with real-time audio and really need

max_response_output_tokens

, then the way to go is using

RealtimeModel.with_azure()

instead of

LLM.with_azure()

. Hope that helps! Sources: LiveKit Docs | plugins/openai/src/llm.ts | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py

Open in Slack

Previous Next