How would I go about making a TTS node that only works via s LiveKit Community #ask-ai

How would I go about making a TTS node that only w...

wooden-librarian-60399

08/05/2025, 6:03 PM

How would I go about making a TTS node that only works via session.say (i.e does not auto-say the LLM output)? Or would it be easier to make a custom LLM node that does not stream its output into the TTS node?

tall-airline-16756

08/05/2025, 6:03 PM

Thanks for reaching out, Drew. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/05/2025, 6:03 PM

To create a TTS node that only works via

session.say()

and does not auto-say LLM output, you can override the

tts_node

method in your Agent class. This allows you to control when the TTS node processes text, ensuring it only triggers through explicit calls to

session.say()

. Here’s a high-level approach: 1. Extend the Agent Class: Create a subclass of

Agent

and override the

tts_node

method. 2. Modify the TTS Node: In the overridden

tts_node

, add logic to process text only when explicitly called via

session.say()

. 3. Configure the Agent: Set up your Agent with the necessary components (STT, LLM, TTS, and VAD). 4. Implement Session Control: Use

session.say()

to trigger TTS output and

session.generate_reply()

for LLM responses that should not be spoken automatically. This approach gives you fine-grained control over when the TTS node processes text, allowing you to separate manual and automatic responses in your Agent. Let me know if you need further details or code examples!

wooden-librarian-60399

08/05/2025, 6:04 PM

what is the high level approach to implement an LLM node that does not stream its output to the tts node

tall-airline-16756

08/05/2025, 6:05 PM

If you want to set up an LLM node that doesn’t stream its output to the TTS node, a great way to do this is by overriding the

llm_node

method in your custom Agent class. This lets you catch and tweak the LLM output before it reaches the TTS node—or even stop it from streaming altogether. Here’s a simple way to think about it: 1. Create a Custom Agent Class: Start by subclassing the base

Agent

class. 2. Override the
llm_node
Method: This is where you can intercept the LLM output stream. 3. Modify or Process the Output: Feel free to adjust the LLM output however you need before yielding it. 4. Prevent Streaming to TTS: After you’re done processing, raise a

StopResponse

exception to stop the output from going to the TTS node. This way, you have full control over what the LLM outputs and can make sure it doesn’t automatically stream to the TTS node. If you’d like, I’m happy to share more details or some example code to help you out!

wooden-librarian-60399

08/05/2025, 6:06 PM

whats the difference between agent and agentsession

tall-airline-16756

08/05/2025, 6:07 PM

Hey there! Just wanted to break down the difference between the Agent and AgentSession in LiveKit for you: • Agent: This is where the AI’s personality and smarts live. It includes all the tools, instructions, and functions that shape how the agent chats with users and gets things done. • AgentSession: Think of this as the session’s manager. It kicks things off and keeps everything running smoothly by handling components like STT, TTS, VAD, and LLM, plus managing the connection to the room and media streams. So, simply put, the Agent is all about what the agent does, while the AgentSession looks after how everything works during a session. If you want me to explain anything more, just give me a shout! Sources: livekit-agents/livekit/agents/voice/__init__.py

wooden-librarian-60399

08/05/2025, 6:07 PM

what is the difference between say and generate_reply methods?

tall-airline-16756

08/05/2025, 6:08 PM

Hey there! Just wanted to help clarify the difference between the say and generate_reply methods in LiveKit: • say(): This one’s great for predefined messages. It takes your text and turns it into speech using TTS or plays pre-recorded audio. Perfect for those steady, reliable responses like greetings or notifications. • generate_reply(): This method is a bit more dynamic—it creates responses on the fly using the Language Model (LLM) based on what the user says or instructions you give. Then, it converts that response to speech using TTS or a realtime model. It’s awesome for interactive, context-aware chats. Both will give you a

SpeechHandle

object so you can keep track of what your agent is saying. So, think of say() for fixed messages and generate_reply() when you want something more flexible and conversational. If you want me to explain anything else, just let me know—I’m here to help!

wooden-librarian-60399

08/05/2025, 6:09 PM

show me in the code where the llm output is streamed into the tts.

tall-airline-16756

08/05/2025, 6:09 PM

Just to give you a quick overview: the LLM output is streamed into the TTS inside the

perform_tts_inference

function found in

livekit/agents/voice/generation.py

. Here’s the flow: 1. LLM Output Generation: The

perform_llm_inference

function takes your input, runs it through the LLM, and sends text chunks over the

text_ch

channel. 2. TTS Processing: Then, the

perform_tts_inference

function picks up that text stream from

text_ch

and uses the TTS node to turn it into audio. 3. Audio Frame Streaming: Finally, the audio frames get sent through the

audio_ch

channel so they can be played back. If you want me to walk you through any part in more detail or share some code snippets, just let me know—I’m happy to help!

Open in Slack

Previous Next