How to play the audio using AudioFrame in llm_node...
# ask-ai
a
How to play the audio using AudioFrame in llm_node?
t
Thanks for reaching out, Aditya. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To play audio using
AudioFrame
in
llm_node
, follow this process: 1. *Read your audio data*—for example, from a WAV file. Ensure the format is 16-bit signed integer (int16), interleaved by channel. 2. Create an
AudioFrame
by passing the audio data, sample rate, number of channels, and samples per channel. 3. Wrap the
AudioFrame
in an async generator
that yields the frame. 4. Call
session.say()
with the async generator as the
audio
argument to play the audio. Example:
Copy code
import wave
from livekit import rtc

with wave.open('audio.wav', 'rb') as wav_file:
    num_channels = wav_file.getnchannels()
    sample_rate = wav_file.getframerate()
    frames = wav_file.readframes(wav_file.getnframes())

audio_frame = rtc.AudioFrame(
    data=frames,
    sample_rate=sample_rate,
    num_channels=num_channels,
    samples_per_channel=wav_file.getnframes(),
)

async def audio_generator():
    yield audio_frame

await session.say('Playing audio', audio=audio_generator())
• Data length must be at least
num_channels * samples_per_channel * sizeof(int16)
bytes. This will play the specified audio through the agent's voice pipeline. More details on the constructor and data requirements are available in the documentation. • Source: , Sources: livekit-rtc/livekit/rtc/audio_source.py | basics/playing_audio.py
a
I don't want to use session.say
t
Hey there! Not a fan of using
session.say()
? No worries - I've got some friendly alternatives that'll help you play or inject audio using
AudioFrame
in
llm_node
. Let me walk you through two super handy ways to do this: 1. Override
tts_node
: If you're working with pipeline models, you can totally customize your agent's
tts_node
method to create your own
AudioFrame
objects. Check out this quick example:
Copy code
async def tts_node(self, text, model_settings):
    # Yield your custom AudioFrame(s)
    yield your_audio_frame
1. Override
realtime_audio_output_node
: For realtime models, this method is your go-to for generating custom
AudioFrame
objects in your audio stream.
Copy code
async def realtime_audio_output_node(self, audio, model_settings):
    yield your_audio_frame
• These methods are super flexible - they let you have full control over injecting, modifying, or streaming audio frames right through your agent's pipeline, without touching
session.say()
. • Just remember to create your
AudioFrame
objects with the right data and metadata (like sample rate and channels). I've got some detailed examples in previous messages if you need a hand. • Pro tip: The
llm_node
typically handles text, while audio magic happens in
tts_node
(for text-to-speech) or
realtime_audio_output_node
(for live audio agents). Pick the override that best matches your model pipeline, and you'll be streaming audio like a pro in no time! • Source: , Sources: examples/avatar_agents/audio_wave/avatar_runner.py | home_assistant/homeautomation.py
a
I want to stream an audio directly to the user if a certain text is found.
t
Hey there! Let me walk you through a cool way to stream audio when you detect specific text in your agent. It's actually pretty straightforward, and I'll break it down into friendly steps. 1. Inside your
llm_node
method, you'll want to keep an eye out for your target phrase in the generated text. 2. When you spot that phrase, you can quickly generate some AudioFrame(s) – maybe through your
tts_node
or by pulling from a file or buffer. 3. Then just send those audio frames right away through your processing pipeline. Super easy! Here's a simple example to help you visualize:
Copy code
async def llm_node(self, ...):
    async for chunk in llm_stream:
        if isinstance(chunk, str) and "your phrase" in chunk:
            await self._generate_audio_response("Phrase detected!")
        yield chunk

async def _generate_audio_response(self, response_text):
    audio_stream = self.tts_node(lambda: (yield response_text), ModelSettings())
    async for audio_frame in audio_stream:
        await self._handle_audio_frame(audio_frame)
• Pro tip: You can totally swap out
tts_node
with your own AudioFrame sources if you want to play pre-recorded audio or get creative with synthesis. • How you actually stream to the user will depend on your specific audio output setup – just check your
realtime_audio_output_node
or audio pipeline config. • Want to see a full, detailed example? The source code has got you covered! • Source: Sources: tests/utils.py