able-jackal-84140
07/13/2025, 4:17 PMasync def llm_node(
self,
chat_ctx: llm.ChatContext,
tools: list[FunctionTool],
model_settings: ModelSettings
) -> AsyncIterable[llm.ChatChunk]:
# Insert custom preprocessing here
async for chunk in Agent.default.llm_node(self, chat_ctx, tools, model_settings):
# Insert custom postprocessing here
if "कृपया" in chunk.delta.content:
<http://logger.info|logger.info>("\n\n\n TOOL IS HAPPENING \n\n\n")
# await self.session.say("Playing audio", audio=audio_generator("waiting_message_hindi.wav"))
# continue
<http://logger.info|logger.info>(f"This is the current chunks coming: {chunk}")
yield chunk
tall-airline-16756
07/13/2025, 4:17 PMtall-airline-16756
07/13/2025, 4:17 PMsession.say()
method along with a custom audio
parameter. This approach allows you to bypass TTS and directly send a pre-synthesized audio file (such as a WAV file) to the user. Here's a step-by-step approach suitable for your setup:
async def play_audio_file(self):
audio_path = Path(__file__).parent / "audio.wav"
with wave.open(str(audio_path), 'rb') as wav_file:
num_channels = wav_file.getnchannels()
sample_rate = wav_file.getframerate()
frames = wav_file.readframes(wav_file.getnframes())
audio_frame = rtc.AudioFrame(
data=frames,
sample_rate=sample_rate,
num_channels=num_channels,
samples_per_channel=wav_file.getnframes()
)
async def audio_generator():
yield audio_frame
await self.session.say("Playing audio", audio=audio_generator())
You can trigger this logic from within your llm_node
when conditions are met (such as specific text being detected in a chunk).
Alternatively, for more advanced control (background playback, stop, loop), the BackgroundAudioPlayer
class can be used. Let me know if you want the details for that method.
Sources: basics/playing_audio.py | Agent speech and audio | LiveKit Docs | examples/voice_agents/error_callback.py | meditation_assistant/README.mdocable-jackal-84140
07/13/2025, 4:18 PMtall-airline-16756
07/13/2025, 4:18 PMllm_node
using session.say()
in a LiveKit agent. It's a neat trick for replacing part of your output stream with a custom audio response when a specific phrase pops up.
import wave
from pathlib import Path
from livekit import rtc
from livekit.agents import llm
async def llm_node(self, chat_ctx: llm.ChatContext, tools: list, model_settings):
async for chunk in Agent.default.llm_node(self, chat_ctx, tools, model_settings):
if "कृपया" in chunk.delta.content:
# Log for debugging
<http://logger.info|logger.info>("\n\n\n TOOL IS HAPPENING \n\n\n")
# Path to your audio file
audio_path = Path(__file__).parent / "waiting_message_hindi.wav"
# Read and prepare the audio file
with wave.open(str(audio_path), 'rb') as wav_file:
num_channels = wav_file.getnchannels()
sample_rate = wav_file.getframerate()
frames = wav_file.readframes(wav_file.getnframes())
samples_per_channel = wav_file.getnframes()
audio_frame = rtc.AudioFrame(
data=frames,
sample_rate=sample_rate,
num_channels=num_channels,
samples_per_channel=samples_per_channel
)
# Create an async generator for the frame
async def audio_generator():
yield audio_frame
# Say the custom audio
await self.session.say("Playing audio", audio=audio_generator())
continue # Optionally continue or not, based on your flow
<http://logger.info|logger.info>(f"This is the current chunks coming: {chunk}")
yield chunk
• Here's the deal: this code checks each chunk for the trigger phrase कृपया
. When it matches, it reads and plays the custom WAV file using session.say()
. The custom audio will replace the default text-to-speech.
• Just swap out waiting_message_hindi.wav
with your own audio file path.
• Feel free to tweak the logic to fit your specific needs or add more triggers and conditions.
Want to explore background playback or more advanced audio control? I'm happy to help you out!able-jackal-84140
07/13/2025, 4:21 PMtall-airline-16756
07/13/2025, 4:22 PMsession.say(audio=...)
? No worries—I've got some friendly troubleshooting tips to help you out!
• TTS Plugin Required: Let's first make sure you've got a Text-to-Speech plugin set up. Think of it like having the right microphone for a podcast—the say()
method needs its TTS model to work. If you don't have one, you'll want to use generate_reply()
or double-check that your session's TTS is fully ready to go.
• Valid WAV Format: Audio can be a bit picky! Your WAV file needs to match the right specs—typically 48kHz, mono audio. It's like making sure your favorite playlist is set to the right audio settings.
• Audio Frame Construction: Here's a quick way to set up your audio frame that should do the trick: audio_frame = rtc.AudioFrame(
data=frames,
sample_rate=sample_rate, # ideally 48000
num_channels=num_channels, # ideally 1
samples_per_channel=wav_file.getnframes()
)
• Session State: Just a quick check—is your agent session up and running smoothly? Make sure it's not stuck or in an error state.
• Queue/Buffers: Seeing some audio hiccups? Try increasing your audio source queue size to 1000ms or larger. It's like giving your audio a bit more breathing room.
• Logging: Take a peek at those agent and backend logs. They're like detective notes that can help us track down any sneaky audio issues.
Could you help me confirm a few things?
• Is your session's TTS plugin active and ready to go?
• Is your audio file mono and 48kHz?
• Do you have any logs from when you tried to play the audio?
Don't worry—we'll get to the bottom of this together!
Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Agent speech and audio | LiveKit Docs | tests/test_tts.py | basics/playing_audio.py | agent_extensions/agent_extensions/utils/wav_player.pyable-jackal-84140
07/13/2025, 4:23 PMtall-airline-16756
07/13/2025, 4:23 PMable-jackal-84140
07/13/2025, 4:31 PMtall-airline-16756
07/13/2025, 4:31 PM