what does this work How is the run modules in the llm is cal LiveKit Community #ask-ai

Join Slack

what does this work? How is the _run modules in th...

# ask-ai

able-jackal-84140

07/13/2025, 12:23 PM

what does this work? How is the _run modules in the llm is called ?

tall-airline-16756

07/13/2025, 12:23 PM

Thanks for reaching out, Aditya. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/13/2025, 12:23 PM

The

_run

methods in LLM modules act as the core execution engine for handling LLM interactions within LiveKit. They are called automatically by the

LLMStream

class lifecycle. When an

LLMStream

is initialized, it starts a main task which repeatedly calls

_run()

via the

_main_task()

method. This method manages retries and error handling if API errors occur. Each provider-specific subclass implements its own

_run()

method, which is responsible for: • Sending requests to the specific LLM provider's API • Processing streaming responses and emitting standardized

ChatChunk

events • Handling error recovery and connection issues The

_run

method processes the LLM's streamed response, converts it to LiveKit's format, and communicates through event channels for further action. Error handling and retries are coordinated by the higher-level

_main_task()

wrapper. The entire system is designed for robust streaming, fallback, and recovery across multiple LLM providers. This ensures provider independence and consistent behavior across different LLM integrations. For further technical specifics (e.g., code samples of invocation and provider communication), please request more targeted details. Sources: livekit-agents/livekit/agents/llm/fallback_adapter.py

able-jackal-84140

07/13/2025, 12:24 PM

how can I skip this method after for some specific messaging?

tall-airline-16756

07/13/2025, 12:25 PM

Hey there! I totally get what you're looking for. From what I can see, the LLMStream classes don't have a built-in way to skip processing certain messages out of the box. But don't worry - I've got a couple of friendly solutions for you! Here are two approaches that might help: • Agent/Pre-LLM Hooks: The

WakeWordHandler

comes with a super handy

before_llm_callback

that agents like

VoicePipelineAgent

use. Think of it like a gatekeeper - if this callback says "nope" (returns

False

), the whole LLM process gets paused for that message. It's like catching things before they even start! • Subclass Override: If you're looking for something more tailored, you can create your own custom LLMStream and tweak the

_run

method. Add some smart checks at the beginning to decide whether you want to process the message or do something different. The best part? Most developers handle this by filtering messages before they even start processing, which keeps things clean and efficient. Want me to walk you through a specific example? Just let me know the details, and I'll help you craft the perfect solution! Sources: pipeline-llm/llm_powered_content_filter.py | agent_extensions/agent_extensions/handlers/wake_word_handler.py | livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realti...

able-jackal-84140

07/13/2025, 12:26 PM

give example of this

tall-airline-16756

07/13/2025, 12:27 PM

Hey there! Let me walk you through a super cool way to use the

before_llm_callback

in a WakeWordHandler that helps you skip LLM inference for specific messages in LiveKit. Trust me, it's easier than it sounds!

Copy code

class WakeWordHandler:
    def __init__(self, wake_word: str = "sam"):
        self._current_state = ListeningState.IDLE
        self._wake_word = wake_word.lower()

    async def before_llm_callback(self, agent, chat_ctx) -> Optional[bool]:
        last_message = chat_ctx.messages[-1] if chat_ctx.messages else None

        if last_message and last_message.role == "user":
            cleaned_message = last_message.content.lower().strip()
            if self._wake_word in cleaned_message and cleaned_message in [self._wake_word, f"{self._wake_word}.", f"{self._wake_word}?"]:
                # If the message is only the wake word, skip LLM inference
                return False
        return None

Connecting with VoicePipelineAgent:

Copy code

wake_word_handler = WakeWordHandler(wake_word="sam")

agent = VoicePipelineAgent(
    vad=...,  # Voice activity detector
    stt=...,  # Speech-to-text
    llm=...,  # LLM instance
    tts=...,  # Text-to-speech
    chat_ctx=initial_ctx,
    before_llm_cb=wake_word_handler.before_llm_callback  # Pass the callback
)

agent.start(ctx.room, participant)

Here's the magic: when someone just says the wake word (like "sam", "sam.", or "sam?"), the

before_llm_callback

returns

False

. This means no LLM processing happens, and the

_run

method takes a break. Pretty neat, right? This approach gives you the flexibility to control exactly which messages trigger LLM inference, all based on the conversation's context. It's like having a smart assistant who knows exactly when to listen and when to stay quiet! Sources: BeforeLLMCallback | LiveKit Agents | chain-of-thought-tts/README.mdoc | VoicePipelineAgent | LiveKit Docs

Open in Slack

Previous Next