We are using a custom implementation of llm_node w...
# ask-ai
s
We are using a custom implementation of llm_node where, instead of passing through the default llm_node (which directly yields chunks), we invoke our own LLM implementation to stream responses. The issue arises when calling session.say inside the custom llm_node: • Scenario 1 (not awaited): The session.say output is spoken, but only after the agent session has already progressed and delivered the main response from the agent. This introduces an unintended delay. • Scenario 2 (awaited): The session.say output is never spoken, and the execution of the llm_node function effectively halts without throwing an error. This blocks TTS for all subsequent user inputs. Our use case requires session.say to trigger speech immediately at the point of invocation (e.g., to provide a short acknowledgment such as “processing your request” before the main response). Since Scenario 1 introduces delayed playback and Scenario 2 results in deadlock, neither option is viable. How to resolve the above issue and use effective session.say under llm_node.
t
Thanks for reaching out, Abhi. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To help you smoothly resolve the issue with
session.say
in your custom
llm_node
implementation and get immediate speech without any delays or deadlocks, here are some friendly tips to consider: • Use Pre-Synthesized Audio: If you can, try providing pre-generated audio with the
audio
parameter in
session.say
. This is a great way to skip TTS latency and enjoy faster playback. • Avoid Blocking: It’s best not to await the
SpeechHandle
returned by
session.say
. This helps keep things running smoothly by allowing other tasks to continue without waiting. • Fast Acknowledgments: For quick responses like "Processing your request," you can call
session.say
using a fast LLM model. This way, you get a short acknowledgment while the main LLM is still working. • Control Chat Context: To keep your conversation history clean, use the
add_to_chat_ctx
parameter to exclude filler messages (e.g.,
add_to_chat_ctx=False
). Here’s a simple example to illustrate this:
Copy code
async def on_user_turn_completed(self, turn_ctx: ChatContext, new_message: ChatMessage):
    # Trigger immediate acknowledgment without blocking main response
    self.session.say("One moment, I'm thinking.", add_to_chat_ctx=False)
    # Continue with long-running LLM task
I hope these suggestions make things clearer and easier for you! If you have any more questions or need a hand with anything else, please don’t hesitate to ask. I’m here to help! Sources: examples/src/realtime_turn_detector.ts | Agent speech and audio | LiveKit Docs | examples/voice_agents/llamaindex-rag/retrieval.py
s
Issue is under the llm_node behaviour is of session.say() is not same as it's outside the llm_node. We are using this method in another place as well but it's only happening with llm_node.
t
Thanks so much for your question! The difference in
session.say
behavior inside
llm_node
compared to outside it happens because
llm_node
is part of the LLM processing pipeline. Inside
llm_node
, the output is intercepted and can be modified in real-time before it reaches the TTS engine. This allows for helpful transformations like filtering, replacing tokens, or truncating responses, which can influence how
session.say
works. On the other hand, outside
llm_node
, for example in lifecycle hooks like
on_user_turn_completed
,
session.say
sends text directly for TTS synthesis and playback without any intermediate processing. That’s why the behavior you see is different. If you’d like
session.say
to behave consistently inside
llm_node
, you might need to tweak the processing logic within the node to ensure immediate playback or to skip certain transformations. I’m here to help if you want some guidance on how to adjust the
llm_node
logic to make this happen—just let me know! Sources: livekit-agents/livekit/agents/voice/agent.py | multi-agent/long_or_short_agent.py | Modifying LLM output before TTS | LiveKit Docs | Pipeline nodes and hooks | LiveKit Docs
s
Can you give more details on llm_node logic need to be changed
t
I'm glad you're looking to customize the
llm_node
logic to make sure
session.say
plays back immediately without any delays or transformations. You can achieve this by overriding the
llm_node
in your own
Agent
subclass. Here are some friendly approaches to guide you through: • Directly Process LLM Output: You can override
llm_node
to gently intercept and lightly modify the LLM output stream. For instance, you might want to replace certain tokens or phrases before sending them to the TTS engine:
Copy code
class SimpleAgent(Agent):
    async def llm_node(self, chat_ctx, tools, model_settings=None):
        async def process_stream():
            async with self.llm.chat(chat_ctx=chat_ctx, tools=tools, tool_choice=None) as stream:
                async for chunk in stream:
                    if chunk is None:
                        continue

                    # Extract and process content
                    content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else str(chunk)
                    if content is None:
                        yield chunk
                        continue

                    processed_content = content.replace("<tool_call>", "").replace("found", "Okay, I'm ready to respond.")

                    if processed_content != content:
                        if hasattr(chunk, 'delta') and hasattr(chunk.delta, 'content'):
                            chunk.delta.content = processed_content
                        else:
                            chunk = processed_content

                    yield chunk

        return process_stream()
Immediate Speech for Tool Calls: Another helpful tip is to detect tool calls in the LLM output and trigger
session.say
right away. Here's an example to illustrate that:
Copy code
class FunctionAgent(Agent):
    async def llm_node(self, chat_ctx, tools, model_settings):
        activity = self._activity
        tool_call_detected = False

        async for chunk in super().llm_node(chat_ctx, tools, model_settings):
            if isinstance(chunk, ChatChunk) and chunk.delta and chunk.delta.tool_calls and not tool_call_detected:
                tool_call_detected = True
                activity.agent.session.say("Sure, I'll check that for you.")  # Immediate response

            yield chunk
Additional Optimizations: ◦ Consider using pre-synthesized audio with the
audio
parameter in
session.say
to reduce TTS latency. ◦ Setting
allow_interruptions=False
in
session.say
can help ensure smooth playback without interruptions. ◦ If you’re manually initiating speech, disabling
preemptive_generation
can help avoid any conflicts. I hope these tips help you gain fine-grained control over your LLM output stream and ensure your playback is immediate and smooth. If you’d like, I’m here to help with any part of the implementation—just let me know! Sources: Pipeline nodes and hooks | LiveKit Docs | examples/voice_agents/llamaindex-rag/retrieval.py | Modifying LLM output before TTS | LiveKit Docs | pipeline-llm/replacing_llm_output.py | Agent speech and audio | LiveKit Docs
s
Can you explain the "activity" for FunctionAgent code and how it differs compared to "session"
t
I'm glad you asked about the difference between
activity
and
session
in the FunctionAgent code! Here’s a friendly breakdown to help clarify their roles and scope: •
AgentSession
(Session):
Think of this as the big picture manager. It oversees the entire lifecycle and state of the agent within a LiveKit room. It takes care of session-level events like connecting to the room, managing input/output streams, and smoothly handling transitions between different
AgentActivity
instances. The session sticks around through updates or handoffs and keeps the global conversation context intact. •
AgentActivity
(Activity):
This one handles the hands-on, real-time tasks like speech recognition, language model inference, and speech synthesis. It’s in charge of scheduling speech, detecting turns, and managing when a user interrupts. Each activity is linked to a single agent configuration and is created and managed by the session. When the agent’s configuration changes, the session swaps out the current activity for a new one. So, to sum it up: the
session
is your overall guide managing the agent’s lifecycle, while the
activity
focuses on the real-time work for the current agent instance. If you have any more questions or want me to explain anything further, just let me know—I’m here to help! Sources: Agents Core Architecture | livekit/livekit_composite | DeepWiki | livekit-agents/livekit/agents/voice/agent.py
s
Will generate_reply would be aplicable for this situation if we want to use that method.
t
Absolutely! The
generate_reply
method is a great tool you can use in your custom
llm_node
implementation to trigger immediate speech or responses. It works by dynamically prompting the LLM to create a response based on instructions or user input, and then kicks off speech generation through the TTS pipeline. Here are some friendly examples to help you see how it can be applied: • Session Start: You can call
self.session.generate_reply()
inside the
on_enter
method to start a response right when the session begins:
Copy code
class SimpleAgent(Agent):
    async def on_enter(self):
        self.session.generate_reply()
Custom Instructions: If you want to guide the response more specifically, you can provide instructions like this:
Copy code
await self.session.generate_reply(
    instructions="Tell the user we’re about to run some slow operations.",
    allow_interruptions=False
)
Proactive Responses: It’s also handy within tool functions or your
llm_node
to keep users informed during asynchronous tasks. Here are some key parameters to keep in mind: • `instructions`: These are the context or directions you give the LLM to shape its response. • `user_input`: This is where you can provide direct user input to generate a response and add it to the chat history. • `allow_interruptions`: This controls whether the speech can be interrupted (by default, it’s set to
True
). If you’d like, I’m here to help you further with integrating
generate_reply
into your
llm_node
logic—just let me know! Sources: Agent speech and audio | LiveKit Docs | Agent speech and audio | LiveKit Docs | agents/src/voice/agent_session.ts | pipeline-llm/replacing_llm_output.py | livekit-agents/livekit/agents/voice/agent_activity.py
s
Can we solve this issue using Agenttask if yes then how it can be used
t
Thanks for reaching out! From what we have, it doesn’t look like
AgentTask
is specifically built to handle issues with
session.say
in
llm_node
. But don’t worry, there are some great alternatives to help you get immediate speech or responses: • Pre-Synthesized Audio: You can use pre-generated audio with
session.say
to skip TTS latency and enjoy quicker playback. • Preemptive Generation: Try enabling
preemptive_generation
in
AgentSession
so the response starts generating even before the user finishes their turn—this helps cut down on wait time. • Custom TTS Node: Another option is customizing the
tts_node
to handle text chunks and dynamically interrupt or adjust responses as needed. If you have a particular use case in mind for
AgentTask
or want me to explain anything in more detail, just let me know—I’m here to help! Sources: tests/test_agent_session.py | Agent speech and audio | LiveKit Docs | Agent speech and audio | LiveKit Docs | pipeline-tts/short_replies_only.py
s
Please explain it in more detail
t
AgentTask is a thoughtfully designed feature that helps you carry out specific, stateful actions within a conversation session. It supports
session.say()
for immediate speech, and I’d love to walk you through some important points about how it works: • Speech Handling: When you use
session.say()
inside an AgentTask, it works just like it does in a regular Agent. It triggers speech synthesis through the TTS plugin or plays pre-synthesized audio. Plus, it returns a
SpeechHandle
, which gives you handy control over playback—like waiting for it to finish or adding callbacks. • Session Control: An AgentTask takes temporary control of the session as soon as it starts and then releases it once you call
complete()
. This helps make sure that the speech inside the task happens smoothly and in order, without interruptions from the previous agent. • Interruption Management: If you’re using a
RealtimeModel
LLM with server-side turn detection, please note that setting
allow_interruptions=False
in
session.say()
will be ignored. To handle interruptions effectively, you can turn off server-side turn detection and use VAD-based handling configured at the session or task level. • Stateful Operations: AgentTasks shine when you need to manage isolated, stateful processes—like gathering user consent or carrying out specific tasks. For example, your task might use
session.generate_reply()
to ask for input and then call
self.complete()
to wrap things up. While AgentTask offers a neat and organized way to manage speech and logic, it doesn’t change the inherent latency or real-time behavior of
session.say()
. The timing still depends on TTS speed and your session settings. If you’d like, I’m here to help with more details or examples—just let me know! Sources: tests/test_agent_session.py | pipeline-llm/interrupt_user.py | agents/src/voice/generation.ts | Workflows | LiveKit Docs | Workflows | LiveKit Docs