how to Disable LLM thinking while using from livek...
# ask-ai
n
how to Disable LLM thinking while using from livekit.plugins import openai
t
Thanks for reaching out, Mohamed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To disable LLM thinking while using
from livekit.plugins import openai
, you can modify the LLM output to remove thinking tokens before they reach the TTS engine. Implement a callback function to remove the
<think>
and
</think>
tags:
Copy code
async def _before_tts_cb(agent: VoicePipelineAgent, text: str | AsyncIterable[str]):
    if isinstance(text, str):
        # Handle non-streaming text
        result = text.replace("<think>", "").replace("</think>", "")
        return result
    else:
        # Handle streaming text
        async def process_stream():
            async for chunk in text:
                processed = chunk.replace("<think>", "").replace("</think>", "Okay, I'm ready to respond.")
                yield processed
        return process_stream()

agent = VoicePipelineAgent(
    ...,  # Your other configurations
    before_tts_cb=_before_tts_cb
)
This will ensure that the LLM thinking tokens are removed or replaced before being processed further. Sources: delay-handling/agent.py
n
iam using AgentSession
t
If you want to turn off the LLM thinking while using
AgentSession
, a neat way is to tweak the LLM output by stripping out the "thinking" tags before anything else happens. Here’s a simple example showing how you can do this by overriding the
llm_node
method:
Copy code
async def llm_node(self, chat_ctx, tools, model_settings=None):
    async def process_stream():
        async with self.llm.chat(chat_ctx=chat_ctx, tools=tools, tool_choice=None) as stream:
            async for chunk in stream:
                if chunk is None:
                    continue

                content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else str(chunk)
                if content is None:
                    yield chunk
                    continue

                processed_content = content.replace("<think>", "").replace("</think>", "")
                if processed_content != content:
                    if hasattr(chunk, 'delta') and hasattr(chunk.delta, 'content'):
                        chunk.delta.content = processed_content
                    else:
                        chunk = processed_content

                yield chunk

    return process_stream()
This little tweak catches the LLM’s output stream and gently removes those "thinking" tags before the rest of the processing kicks in. If you’d like, I’m here to help you with any part of this—just let me know!