wooden-librarian-60399
08/05/2025, 6:03 PMtall-airline-16756
08/05/2025, 6:03 PMtall-airline-16756
08/05/2025, 6:03 PMsession.say() and does not auto-say LLM output, you can override the tts_node method in your Agent class. This allows you to control when the TTS node processes text, ensuring it only triggers through explicit calls to session.say().
Here’s a high-level approach:
1. Extend the Agent Class: Create a subclass of Agent and override the tts_node method.
2. Modify the TTS Node: In the overridden tts_node, add logic to process text only when explicitly called via session.say().
3. Configure the Agent: Set up your Agent with the necessary components (STT, LLM, TTS, and VAD).
4. Implement Session Control: Use session.say() to trigger TTS output and session.generate_reply() for LLM responses that should not be spoken automatically.
This approach gives you fine-grained control over when the TTS node processes text, allowing you to separate manual and automatic responses in your Agent. Let me know if you need further details or code examples!wooden-librarian-60399
08/05/2025, 6:04 PMtall-airline-16756
08/05/2025, 6:05 PMllm_node method in your custom Agent class. This lets you catch and tweak the LLM output before it reaches the TTS node—or even stop it from streaming altogether.
Here’s a simple way to think about it:
1. Create a Custom Agent Class: Start by subclassing the base Agent class.
2. Override the llm_node Method: This is where you can intercept the LLM output stream.
3. Modify or Process the Output: Feel free to adjust the LLM output however you need before yielding it.
4. Prevent Streaming to TTS: After you’re done processing, raise a StopResponse exception to stop the output from going to the TTS node.
This way, you have full control over what the LLM outputs and can make sure it doesn’t automatically stream to the TTS node. If you’d like, I’m happy to share more details or some example code to help you out!wooden-librarian-60399
08/05/2025, 6:06 PMtall-airline-16756
08/05/2025, 6:07 PMwooden-librarian-60399
08/05/2025, 6:07 PMtall-airline-16756
08/05/2025, 6:08 PMSpeechHandle object so you can keep track of what your agent is saying. So, think of say() for fixed messages and generate_reply() when you want something more flexible and conversational. If you want me to explain anything else, just let me know—I’m here to help!wooden-librarian-60399
08/05/2025, 6:09 PMtall-airline-16756
08/05/2025, 6:09 PMperform_tts_inference function found in livekit/agents/voice/generation.py. Here’s the flow:
1. LLM Output Generation: The perform_llm_inference function takes your input, runs it through the LLM, and sends text chunks over the text_ch channel.
2. TTS Processing: Then, the perform_tts_inference function picks up that text stream from text_ch and uses the TTS node to turn it into audio.
3. Audio Frame Streaming: Finally, the audio frames get sent through the audio_ch channel so they can be played back.
If you want me to walk you through any part in more detail or share some code snippets, just let me know—I’m happy to help!