Hey there, I wanted to understand if there is a wa...
# ask-ai
r
Hey there, I wanted to understand if there is a way where I can control the context window of the session which essentially will lower my token cost and moreoever I don't want to always send large context window to my LLM
t
Thanks for reaching out, Mitesh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! You’ve got the power to control the context window of a session in LiveKit, which helps you make the most of your token usage and avoid sending huge context windows to LLMs. You can do this easily by using the
before_llm_cb
callback in the
VoicePipelineAgent
. This lets you tweak the
ChatContext
just before it gets sent to the LLM. For instance, if you want to keep things simple, you can truncate the context to only include the last 15 messages like this:
Copy code
async def truncate_context(assistant: VoicePipelineAgent, chat_ctx: llm.ChatContext):
    if len(chat_ctx.messages) > 15:
        chat_ctx.messages = chat_ctx.messages[-15:]

agent = VoicePipelineAgent(
    ...
    before_llm_cb=truncate_context,
)
There are other cool ways to manage this too, like cutting off based on token count, filtering by content type, or keeping a sliding window of the most recent conversation turns. If you want a hand setting any of these up, just give me a shout—I’m here to help! Sources: Vision Agent Quickstart | LiveKit Docs | VoicePipelineAgent | LiveKit Docs
r
I am not using the VoicePipelineAgent. Below is how the agent is created. class VoiceAgent(Agent): def __init__(self, ctx: JobContext): # Define tools super().__init__( instructions="You are helpful assistant which provides information related to world monuments", turn_detection=EnglishModel(), allow_interruptions=True, vad=silero.VAD.load(), stt=speech_to_text, tts=text_to_speech, llm=large_language_model, )
t
Since you’re working with a custom agent class derived from
Agent
, you’ve got some great options to manage the context window using the built-in context management features of the
Agent
class. Here are a few friendly approaches you can try: 1. Set Initial Context: You can pass a limited
ChatContext
to your agent when you initialize it, which helps control the size of the initial context. 2. Dynamic Context Management: Feel free to use the
chat_ctx
property to access and update the context on the fly while your agent is running. 3. Override the LLM Node: If you want to get a bit more hands-on, you can customize how the context window works by overriding the
llm_node
method. For example, applying a sliding window to keep only the last N messages can be really handy: These options give you some flexible ways to manage your context window effectively. If you want to dive deeper or need a hand with any of this, just let me know — I’m here to help! Sources: Agents 0.x migration guide | LiveKit Docs | Workflows | LiveKit Docs | livekit-agents/livekit/agents/voice/agent.py
r
can you share code snippets for all of those above on how to achieve. Moreover also suggest best practice
t
Here are some detailed code snippets for each way you can manage the context window in your custom agent class, along with a few best practices to keep in mind: 1. Set Initial Context: You can pass a limited
ChatContext
when initializing to keep the initial context size in check. 2. Dynamic Context Management: You can access and update the context on the fly during runtime. 3. Override the LLM Node: Another option is to customize the context window logic by overriding the
llm_node
method. Best Practices: • Try using a sliding window approach to keep the conversation flowing smoothly while managing token usage. • If your LLM has strict token limits, consider truncating context based on token count. • Filter out unnecessary content types like images or metadata to keep the context lean. • Test your setup with real-world examples to make sure your assistant stays helpful and responsive. If you want me to explain anything further or need more examples, just let me know—I’m here to help!
r
Can you share example code? I don't see any code in your earlier comment
t
Here are some example code snippets to help you manage the context window in your custom agent class: 1. Set Initial Context: You can start by passing a limited
ChatContext
during initialization to keep the initial context size in check. 2. Dynamic Context Management: You can also access and update the context dynamically as your agent runs. 3. Override the LLM Node: For even more control, try overriding the
llm_node
method to implement your own context window logic. Best Practices: • Using a sliding window approach helps keep the conversation flowing smoothly while managing token usage. • If your LLM has strict token limits, try truncating context based on token count. • Filtering out unnecessary content types, like images or metadata, can really help keep the context size optimized. • Don't forget to test your setup with real-world scenarios to make sure your assistant stays effective and responsive. If you want me to walk you through any part of this or need more examples, just let me know—I’m happy to help!
r
Provide me code implementation for Using a sliding window approach helps keep the conversation flowing smoothly while managing token usage.
t
Here’s a friendly example of how you can use a sliding window approach to manage the context window in your custom agent class. This way, you keep only the most recent messages, helping the conversation stay clear and relevant while managing token usage efficiently:
Copy code
from livekit.agents import Agent, ChatContext

class SlidingWindowAgent(Agent):
    def __init__(self, max_messages=10):
        self.max_messages = max_messages
        super().__init__(
            instructions="You are a helpful assistant.",
            # other parameters...
        )

    def llm_node(self, chat_ctx, tools, model_settings):
        # Apply sliding window logic to limit context size
        chat_ctx.messages = chat_ctx.messages[-self.max_messages:]  # Keep only the last N messages
        return Agent.default.llm_node(self, chat_ctx, tools, model_settings)

    def _apply_sliding_window(self, chat_ctx):
        """Helper method to apply sliding window logic."""
        if len(chat_ctx.messages) > self.max_messages:
            chat_ctx.messages = chat_ctx.messages[-self.max_messages:]
        return chat_ctx
How It Works: • The
max_messages
parameter sets how many recent messages you want to keep in the context. • The
llm_node
method uses the sliding window logic right before sending the context to the LLM. • The
_apply_sliding_window
helper method makes sure the context stays trimmed to the right size. This setup helps keep your conversations smooth and on point while making the best use of tokens. If you want me to walk you through it or help with anything else, just let me know!
r
I tried implementation of llm_node, looks like ChatContext object does not have messages attribute. Got an error AttributeError: 'ChatContext' object has no attribute 'messages'
t
Hey there! Just a quick heads-up about the Python implementation of LiveKit: the
ChatContext
object doesn’t have a
messages
attribute. Instead, it uses an
items
property, which is a list of
ChatItem
objects. These can be
ChatMessage
,
FunctionCall
, or
FunctionCallOutput
types. If you want to use a sliding window approach, you can take advantage of the
truncate
method that
ChatContext
provides. Here’s an updated example that might help:
Copy code
from livekit.agents import Agent, ChatContext

class SlidingWindowAgent(Agent):
    def __init__(self, max_items=10):
        self.max_items = max_items
        super().__init__(
            instructions="You are a helpful assistant.",
            # other parameters...
        )

    def llm_node(self, chat_ctx, tools, model_settings):
        # Apply sliding window logic to limit context size
        chat_ctx.truncate(max_items=self.max_items)  # Keep only the last N items
        return Agent.default.llm_node(self, chat_ctx, tools, model_settings)
Just a couple of things to keep in mind: • The
truncate
method is made exactly for limiting how many items stay in the context. • So instead of using
messages
, go with
items
or just use
truncate
to manage your context smoothly. If anything’s unclear or you want me to walk you through more examples, just let me know—I’m here to help! Sources: Voice pipeline nodes | LiveKit Docs | agent/main.py | ChatContext | LiveKit Agents | livekit-agents/livekit/agents/llm/chat_context.py | agents/src/llm/chat_context.ts
r
Where do I get list of methods that can be overridden in the Agentclass. Share me link of documentation
t
Hey there! If you're curious about which methods you can override in the
Agent
class, a great place to start is by checking out the LiveKit Agents source code. Here are some important methods you might want to customize: • Lifecycle Methods: ◦ `on_enter()`: This one’s called whenever the task starts. ◦ `on_exit()`: Gets triggered when the task finishes. ◦ `on_user_turn_completed(turn_ctx, new_message)`: Happens right after the user finishes speaking and before the LLM responds. • Pipeline Node Methods: ◦ `stt_node(audio, model_settings)`: Turns audio frames into speech events. ◦ `llm_node(chat_ctx, tools, model_settings)`: Manages text generation using an LLM. ◦ `transcription_node(text, model_settings)`: Wraps up transcription from text segments. ◦ `tts_node(text, model_settings)`: Converts text into audio. ◦ `realtime_audio_output_node(audio, model_settings)`: Handles audio from the realtime LLM session before it plays back. If you want to dive deeper or see some examples, the LiveKit Agents source code is super helpful and you can find it here: https://github.com/livekit/agents Feel free to reach out if you want a hand with any specific method — I’m here to help! Sources: livekit-agents/livekit/agents/__init__.py