Hey there I wanted to understand if there is a way where I c LiveKit Community #ask-ai

Hey there, I wanted to understand if there is a wa...

rhythmic-printer-24192

08/19/2025, 5:30 PM

Hey there, I wanted to understand if there is a way where I can control the context window of the session which essentially will lower my token cost and moreoever I don't want to always send large context window to my LLM

tall-airline-16756

08/19/2025, 5:30 PM

Thanks for reaching out, Mitesh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/19/2025, 5:31 PM

Hey there! You’ve got the power to control the context window of a session in LiveKit, which helps you make the most of your token usage and avoid sending huge context windows to LLMs. You can do this easily by using the

before_llm_cb

callback in the

VoicePipelineAgent

. This lets you tweak the

ChatContext

just before it gets sent to the LLM. For instance, if you want to keep things simple, you can truncate the context to only include the last 15 messages like this:

Copy code

async def truncate_context(assistant: VoicePipelineAgent, chat_ctx: llm.ChatContext):
    if len(chat_ctx.messages) > 15:
        chat_ctx.messages = chat_ctx.messages[-15:]

agent = VoicePipelineAgent(
    ...
    before_llm_cb=truncate_context,
)

There are other cool ways to manage this too, like cutting off based on token count, filtering by content type, or keeping a sliding window of the most recent conversation turns. If you want a hand setting any of these up, just give me a shout—I’m here to help! Sources: Vision Agent Quickstart | LiveKit Docs | VoicePipelineAgent | LiveKit Docs

rhythmic-printer-24192

08/19/2025, 5:34 PM

I am not using the VoicePipelineAgent. Below is how the agent is created. class VoiceAgent(Agent): def __init__(self, ctx: JobContext): # Define tools super().__init__( instructions="You are helpful assistant which provides information related to world monuments", turn_detection=EnglishModel(), allow_interruptions=True, vad=silero.VAD.load(), stt=speech_to_text, tts=text_to_speech, llm=large_language_model, )

tall-airline-16756

08/19/2025, 5:35 PM

Since you’re working with a custom agent class derived from

Agent

, you’ve got some great options to manage the context window using the built-in context management features of the

Agent

class. Here are a few friendly approaches you can try: 1. Set Initial Context: You can pass a limited

ChatContext

to your agent when you initialize it, which helps control the size of the initial context. 2. Dynamic Context Management: Feel free to use the

chat_ctx

property to access and update the context on the fly while your agent is running. 3. Override the LLM Node: If you want to get a bit more hands-on, you can customize how the context window works by overriding the

llm_node

method. For example, applying a sliding window to keep only the last N messages can be really handy: These options give you some flexible ways to manage your context window effectively. If you want to dive deeper or need a hand with any of this, just let me know — I’m here to help! Sources: Agents 0.x migration guide | LiveKit Docs | Workflows | LiveKit Docs | livekit-agents/livekit/agents/voice/agent.py

rhythmic-printer-24192

08/19/2025, 5:36 PM

can you share code snippets for all of those above on how to achieve. Moreover also suggest best practice

tall-airline-16756

08/19/2025, 5:36 PM

Here are some detailed code snippets for each way you can manage the context window in your custom agent class, along with a few best practices to keep in mind: 1. Set Initial Context: You can pass a limited

ChatContext

when initializing to keep the initial context size in check. 2. Dynamic Context Management: You can access and update the context on the fly during runtime. 3. Override the LLM Node: Another option is to customize the context window logic by overriding the

llm_node

method. Best Practices: • Try using a sliding window approach to keep the conversation flowing smoothly while managing token usage. • If your LLM has strict token limits, consider truncating context based on token count. • Filter out unnecessary content types like images or metadata to keep the context lean. • Test your setup with real-world examples to make sure your assistant stays helpful and responsive. If you want me to explain anything further or need more examples, just let me know—I’m here to help!

rhythmic-printer-24192

08/19/2025, 5:38 PM

Can you share example code? I don't see any code in your earlier comment

tall-airline-16756

08/19/2025, 5:38 PM

Here are some example code snippets to help you manage the context window in your custom agent class: 1. Set Initial Context: You can start by passing a limited

ChatContext

during initialization to keep the initial context size in check. 2. Dynamic Context Management: You can also access and update the context dynamically as your agent runs. 3. Override the LLM Node: For even more control, try overriding the

llm_node

method to implement your own context window logic. Best Practices: • Using a sliding window approach helps keep the conversation flowing smoothly while managing token usage. • If your LLM has strict token limits, try truncating context based on token count. • Filtering out unnecessary content types, like images or metadata, can really help keep the context size optimized. • Don't forget to test your setup with real-world scenarios to make sure your assistant stays effective and responsive. If you want me to walk you through any part of this or need more examples, just let me know—I’m happy to help!

rhythmic-printer-24192

08/19/2025, 5:39 PM

Provide me code implementation for Using a sliding window approach helps keep the conversation flowing smoothly while managing token usage.

tall-airline-16756

08/19/2025, 5:39 PM

Here’s a friendly example of how you can use a sliding window approach to manage the context window in your custom agent class. This way, you keep only the most recent messages, helping the conversation stay clear and relevant while managing token usage efficiently:

Copy code

from livekit.agents import Agent, ChatContext

class SlidingWindowAgent(Agent):
    def __init__(self, max_messages=10):
        self.max_messages = max_messages
        super().__init__(
            instructions="You are a helpful assistant.",
            # other parameters...
        )

    def llm_node(self, chat_ctx, tools, model_settings):
        # Apply sliding window logic to limit context size
        chat_ctx.messages = chat_ctx.messages[-self.max_messages:]  # Keep only the last N messages
        return Agent.default.llm_node(self, chat_ctx, tools, model_settings)

    def _apply_sliding_window(self, chat_ctx):
        """Helper method to apply sliding window logic."""
        if len(chat_ctx.messages) > self.max_messages:
            chat_ctx.messages = chat_ctx.messages[-self.max_messages:]
        return chat_ctx

How It Works: • The

max_messages

parameter sets how many recent messages you want to keep in the context. • The

llm_node

method uses the sliding window logic right before sending the context to the LLM. • The

_apply_sliding_window

helper method makes sure the context stays trimmed to the right size. This setup helps keep your conversations smooth and on point while making the best use of tokens. If you want me to walk you through it or help with anything else, just let me know!

rhythmic-printer-24192

08/19/2025, 5:47 PM

I tried implementation of llm_node, looks like ChatContext object does not have messages attribute. Got an error AttributeError: 'ChatContext' object has no attribute 'messages'

tall-airline-16756

08/19/2025, 5:48 PM

Hey there! Just a quick heads-up about the Python implementation of LiveKit: the

ChatContext

object doesn’t have a

messages

attribute. Instead, it uses an

items

property, which is a list of

ChatItem

objects. These can be

ChatMessage

FunctionCall

, or

FunctionCallOutput

types. If you want to use a sliding window approach, you can take advantage of the

truncate

method that

ChatContext

provides. Here’s an updated example that might help:

Copy code

from livekit.agents import Agent, ChatContext

class SlidingWindowAgent(Agent):
    def __init__(self, max_items=10):
        self.max_items = max_items
        super().__init__(
            instructions="You are a helpful assistant.",
            # other parameters...
        )

    def llm_node(self, chat_ctx, tools, model_settings):
        # Apply sliding window logic to limit context size
        chat_ctx.truncate(max_items=self.max_items)  # Keep only the last N items
        return Agent.default.llm_node(self, chat_ctx, tools, model_settings)

Just a couple of things to keep in mind: • The

truncate

method is made exactly for limiting how many items stay in the context. • So instead of using

messages

, go with

items

or just use

truncate

to manage your context smoothly. If anything’s unclear or you want me to walk you through more examples, just let me know—I’m here to help! Sources: Voice pipeline nodes | LiveKit Docs | agent/main.py | ChatContext | LiveKit Agents | livekit-agents/livekit/agents/llm/chat_context.py | agents/src/llm/chat_context.ts

rhythmic-printer-24192

08/19/2025, 5:55 PM

Where do I get list of methods that can be overridden in the Agentclass. Share me link of documentation

tall-airline-16756

08/19/2025, 5:56 PM

Hey there! If you're curious about which methods you can override in the

Agent

class, a great place to start is by checking out the LiveKit Agents source code. Here are some important methods you might want to customize: • Lifecycle Methods: ◦ `on_enter()`: This one’s called whenever the task starts. ◦ `on_exit()`: Gets triggered when the task finishes. ◦ `on_user_turn_completed(turn_ctx, new_message)`: Happens right after the user finishes speaking and before the LLM responds. • Pipeline Node Methods: ◦ `stt_node(audio, model_settings)`: Turns audio frames into speech events. ◦ `llm_node(chat_ctx, tools, model_settings)`: Manages text generation using an LLM. ◦ `transcription_node(text, model_settings)`: Wraps up transcription from text segments. ◦ `tts_node(text, model_settings)`: Converts text into audio. ◦ `realtime_audio_output_node(audio, model_settings)`: Handles audio from the realtime LLM session before it plays back. If you want to dive deeper or see some examples, the LiveKit Agents source code is super helpful and you can find it here: https://github.com/livekit/agents Feel free to reach out if you want a hand with any specific method — I’m here to help! Sources: livekit-agents/livekit/agents/__init__.py

Open in Slack

Previous Next