how to Disable LLM thinking while using from livekit plugins LiveKit Community #ask-ai

Join Slack

how to Disable LLM thinking while using from livek...

# ask-ai

numerous-whale-53652

08/07/2025, 2:31 PM

how to Disable LLM thinking while using from livekit.plugins import openai

tall-airline-16756

08/07/2025, 2:32 PM

Thanks for reaching out, Mohamed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/07/2025, 2:32 PM

To disable LLM thinking while using

from livekit.plugins import openai

, you can modify the LLM output to remove thinking tokens before they reach the TTS engine. Implement a callback function to remove the

<think>

and

</think>

tags:

Copy code

async def _before_tts_cb(agent: VoicePipelineAgent, text: str | AsyncIterable[str]):
    if isinstance(text, str):
        # Handle non-streaming text
        result = text.replace("<think>", "").replace("</think>", "")
        return result
    else:
        # Handle streaming text
        async def process_stream():
            async for chunk in text:
                processed = chunk.replace("<think>", "").replace("</think>", "Okay, I'm ready to respond.")
                yield processed
        return process_stream()

agent = VoicePipelineAgent(
    ...,  # Your other configurations
    before_tts_cb=_before_tts_cb
)

This will ensure that the LLM thinking tokens are removed or replaced before being processed further. Sources: delay-handling/agent.py

numerous-whale-53652

08/07/2025, 2:58 PM

iam using AgentSession

tall-airline-16756

08/07/2025, 2:59 PM

If you want to turn off the LLM thinking while using

AgentSession

, a neat way is to tweak the LLM output by stripping out the "thinking" tags before anything else happens. Here’s a simple example showing how you can do this by overriding the

llm_node

method:

Copy code

async def llm_node(self, chat_ctx, tools, model_settings=None):
    async def process_stream():
        async with self.llm.chat(chat_ctx=chat_ctx, tools=tools, tool_choice=None) as stream:
            async for chunk in stream:
                if chunk is None:
                    continue

                content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else str(chunk)
                if content is None:
                    yield chunk
                    continue

                processed_content = content.replace("<think>", "").replace("</think>", "")
                if processed_content != content:
                    if hasattr(chunk, 'delta') and hasattr(chunk.delta, 'content'):
                        chunk.delta.content = processed_content
                    else:
                        chunk = processed_content

                yield chunk

    return process_stream()

This little tweak catches the LLM’s output stream and gently removes those "thinking" tags before the rest of the processing kicks in. If you’d like, I’m here to help you with any part of this—just let me know!

Open in Slack

Previous Next