We are using a custom implementation of llm node where inste LiveKit Community #ask-ai

We are using a custom implementation of llm_node w...

some-breakfast-96706

09/10/2025, 6:29 AM

We are using a custom implementation of llm_node where, instead of passing through the default llm_node (which directly yields chunks), we invoke our own LLM implementation to stream responses. The issue arises when calling session.say inside the custom llm_node: • Scenario 1 (not awaited): The session.say output is spoken, but only after the agent session has already progressed and delivered the main response from the agent. This introduces an unintended delay. • Scenario 2 (awaited): The session.say output is never spoken, and the execution of the llm_node function effectively halts without throwing an error. This blocks TTS for all subsequent user inputs. Our use case requires session.say to trigger speech immediately at the point of invocation (e.g., to provide a short acknowledgment such as “processing your request” before the main response). Since Scenario 1 introduces delayed playback and Scenario 2 results in deadlock, neither option is viable. How to resolve the above issue and use effective session.say under llm_node.

tall-airline-16756

09/10/2025, 6:29 AM

Thanks for reaching out, Abhi. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/10/2025, 6:30 AM

To help you smoothly resolve the issue with

session.say

in your custom

llm_node

implementation and get immediate speech without any delays or deadlocks, here are some friendly tips to consider: • Use Pre-Synthesized Audio: If you can, try providing pre-generated audio with the

audio

parameter in

session.say

. This is a great way to skip TTS latency and enjoy faster playback. • Avoid Blocking: It’s best not to await the

SpeechHandle

returned by

session.say

. This helps keep things running smoothly by allowing other tasks to continue without waiting. • Fast Acknowledgments: For quick responses like "Processing your request," you can call

session.say

using a fast LLM model. This way, you get a short acknowledgment while the main LLM is still working. • Control Chat Context: To keep your conversation history clean, use the

add_to_chat_ctx

parameter to exclude filler messages (e.g.,

add_to_chat_ctx=False

). Here’s a simple example to illustrate this:

Copy code

async def on_user_turn_completed(self, turn_ctx: ChatContext, new_message: ChatMessage):
    # Trigger immediate acknowledgment without blocking main response
    self.session.say("One moment, I'm thinking.", add_to_chat_ctx=False)
    # Continue with long-running LLM task

I hope these suggestions make things clearer and easier for you! If you have any more questions or need a hand with anything else, please don’t hesitate to ask. I’m here to help! Sources: examples/src/realtime_turn_detector.ts | Agent speech and audio | LiveKit Docs | examples/voice_agents/llamaindex-rag/retrieval.py

some-breakfast-96706

09/10/2025, 6:36 AM

Issue is under the llm_node behaviour is of session.say() is not same as it's outside the llm_node. We are using this method in another place as well but it's only happening with llm_node.

tall-airline-16756

09/10/2025, 6:36 AM

Thanks so much for your question! The difference in

session.say

behavior inside

llm_node

compared to outside it happens because

llm_node

is part of the LLM processing pipeline. Inside

llm_node

, the output is intercepted and can be modified in real-time before it reaches the TTS engine. This allows for helpful transformations like filtering, replacing tokens, or truncating responses, which can influence how

session.say

works. On the other hand, outside

llm_node

, for example in lifecycle hooks like

on_user_turn_completed

session.say

sends text directly for TTS synthesis and playback without any intermediate processing. That’s why the behavior you see is different. If you’d like

session.say

to behave consistently inside

llm_node

, you might need to tweak the processing logic within the node to ensure immediate playback or to skip certain transformations. I’m here to help if you want some guidance on how to adjust the

llm_node

some-breakfast-96706

09/10/2025, 6:38 AM

Can you give more details on llm_node logic need to be changed

tall-airline-16756

09/10/2025, 6:38 AM

I'm glad you're looking to customize the

llm_node

logic to make sure

session.say

plays back immediately without any delays or transformations. You can achieve this by overriding the

llm_node

in your own

Agent

subclass. Here are some friendly approaches to guide you through: • Directly Process LLM Output: You can override

llm_node

to gently intercept and lightly modify the LLM output stream. For instance, you might want to replace certain tokens or phrases before sending them to the TTS engine:

Copy code

class SimpleAgent(Agent):
    async def llm_node(self, chat_ctx, tools, model_settings=None):
        async def process_stream():
            async with self.llm.chat(chat_ctx=chat_ctx, tools=tools, tool_choice=None) as stream:
                async for chunk in stream:
                    if chunk is None:
                        continue

                    # Extract and process content
                    content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else str(chunk)
                    if content is None:
                        yield chunk
                        continue

                    processed_content = content.replace("<tool_call>", "").replace("found", "Okay, I'm ready to respond.")

                    if processed_content != content:
                        if hasattr(chunk, 'delta') and hasattr(chunk.delta, 'content'):
                            chunk.delta.content = processed_content
                        else:
                            chunk = processed_content

                    yield chunk

        return process_stream()

• Immediate Speech for Tool Calls: Another helpful tip is to detect tool calls in the LLM output and trigger

session.say

right away. Here's an example to illustrate that:

Copy code

class FunctionAgent(Agent):
    async def llm_node(self, chat_ctx, tools, model_settings):
        activity = self._activity
        tool_call_detected = False

        async for chunk in super().llm_node(chat_ctx, tools, model_settings):
            if isinstance(chunk, ChatChunk) and chunk.delta and chunk.delta.tool_calls and not tool_call_detected:
                tool_call_detected = True
                activity.agent.session.say("Sure, I'll check that for you.")  # Immediate response

            yield chunk

• Additional Optimizations: ◦ Consider using pre-synthesized audio with the

audio

parameter in

session.say

to reduce TTS latency. ◦ Setting

allow_interruptions=False

session.say

can help ensure smooth playback without interruptions. ◦ If you’re manually initiating speech, disabling

preemptive_generation

can help avoid any conflicts. I hope these tips help you gain fine-grained control over your LLM output stream and ensure your playback is immediate and smooth. If you’d like, I’m here to help with any part of the implementation—just let me know! Sources: Pipeline nodes and hooks | LiveKit Docs | examples/voice_agents/llamaindex-rag/retrieval.py | Modifying LLM output before TTS | LiveKit Docs | pipeline-llm/replacing_llm_output.py | Agent speech and audio | LiveKit Docs

some-breakfast-96706

09/10/2025, 7:29 AM

Can you explain the "activity" for FunctionAgent code and how it differs compared to "session"

tall-airline-16756

09/10/2025, 7:29 AM

I'm glad you asked about the difference between

activity

and

session

in the FunctionAgent code! Here’s a friendly breakdown to help clarify their roles and scope: • AgentSession
(Session): Think of this as the big picture manager. It oversees the entire lifecycle and state of the agent within a LiveKit room. It takes care of session-level events like connecting to the room, managing input/output streams, and smoothly handling transitions between different

AgentActivity

instances. The session sticks around through updates or handoffs and keeps the global conversation context intact. • AgentActivity
(Activity): This one handles the hands-on, real-time tasks like speech recognition, language model inference, and speech synthesis. It’s in charge of scheduling speech, detecting turns, and managing when a user interrupts. Each activity is linked to a single agent configuration and is created and managed by the session. When the agent’s configuration changes, the session swaps out the current activity for a new one. So, to sum it up: the

session

is your overall guide managing the agent’s lifecycle, while the

activity

focuses on the real-time work for the current agent instance. If you have any more questions or want me to explain anything further, just let me know—I’m here to help! Sources: Agents Core Architecture | livekit/livekit_composite | DeepWiki | livekit-agents/livekit/agents/voice/agent.py

some-breakfast-96706

09/10/2025, 7:34 AM

Will generate_reply would be aplicable for this situation if we want to use that method.

tall-airline-16756

09/10/2025, 7:35 AM

Absolutely! The

generate_reply

method is a great tool you can use in your custom

llm_node

implementation to trigger immediate speech or responses. It works by dynamically prompting the LLM to create a response based on instructions or user input, and then kicks off speech generation through the TTS pipeline. Here are some friendly examples to help you see how it can be applied: • Session Start: You can call

self.session.generate_reply()

inside the

on_enter

method to start a response right when the session begins:

Copy code

class SimpleAgent(Agent):
    async def on_enter(self):
        self.session.generate_reply()

• Custom Instructions: If you want to guide the response more specifically, you can provide instructions like this:

Copy code

await self.session.generate_reply(
    instructions="Tell the user we’re about to run some slow operations.",
    allow_interruptions=False
)

• Proactive Responses: It’s also handy within tool functions or your

llm_node

to keep users informed during asynchronous tasks. Here are some key parameters to keep in mind: • `instructions`: These are the context or directions you give the LLM to shape its response. • `user_input`: This is where you can provide direct user input to generate a response and add it to the chat history. • `allow_interruptions`: This controls whether the speech can be interrupted (by default, it’s set to

True

). If you’d like, I’m here to help you further with integrating

generate_reply

into your

llm_node

some-breakfast-96706

09/16/2025, 10:54 AM

Can we solve this issue using Agenttask if yes then how it can be used

tall-airline-16756

09/16/2025, 10:54 AM

Thanks for reaching out! From what we have, it doesn’t look like

AgentTask

is specifically built to handle issues with

session.say

llm_node

. But don’t worry, there are some great alternatives to help you get immediate speech or responses: • Pre-Synthesized Audio: You can use pre-generated audio with

session.say

to skip TTS latency and enjoy quicker playback. • Preemptive Generation: Try enabling

preemptive_generation

AgentSession

so the response starts generating even before the user finishes their turn—this helps cut down on wait time. • Custom TTS Node: Another option is customizing the

tts_node

to handle text chunks and dynamically interrupt or adjust responses as needed. If you have a particular use case in mind for

AgentTask

some-breakfast-96706

09/16/2025, 3:58 PM

Please explain it in more detail

tall-airline-16756

09/16/2025, 3:59 PM

AgentTask is a thoughtfully designed feature that helps you carry out specific, stateful actions within a conversation session. It supports

session.say()

for immediate speech, and I’d love to walk you through some important points about how it works: • Speech Handling: When you use

session.say()

inside an AgentTask, it works just like it does in a regular Agent. It triggers speech synthesis through the TTS plugin or plays pre-synthesized audio. Plus, it returns a

SpeechHandle

, which gives you handy control over playback—like waiting for it to finish or adding callbacks. • Session Control: An AgentTask takes temporary control of the session as soon as it starts and then releases it once you call

complete()

. This helps make sure that the speech inside the task happens smoothly and in order, without interruptions from the previous agent. • Interruption Management: If you’re using a

RealtimeModel

LLM with server-side turn detection, please note that setting

allow_interruptions=False

session.say()

will be ignored. To handle interruptions effectively, you can turn off server-side turn detection and use VAD-based handling configured at the session or task level. • Stateful Operations: AgentTasks shine when you need to manage isolated, stateful processes—like gathering user consent or carrying out specific tasks. For example, your task might use

session.generate_reply()

to ask for input and then call

self.complete()

to wrap things up. While AgentTask offers a neat and organized way to manage speech and logic, it doesn’t change the inherent latency or real-time behavior of

session.say()

Open in Slack

Previous Next