How to play the audio using AudioFrame in llm node LiveKit Community #ask-ai

Join Slack

How to play the audio using AudioFrame in llm_node...

# ask-ai

able-jackal-84140

07/14/2025, 11:47 AM

How to play the audio using AudioFrame in llm_node?

tall-airline-16756

07/14/2025, 11:47 AM

Thanks for reaching out, Aditya. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/14/2025, 11:48 AM

To play audio using

AudioFrame

llm_node

, follow this process: 1. *Read your audio data*—for example, from a WAV file. Ensure the format is 16-bit signed integer (int16), interleaved by channel. 2. Create an
AudioFrame
by passing the audio data, sample rate, number of channels, and samples per channel. 3. Wrap the
AudioFrame
in an async generator that yields the frame. 4. Call
session.say()
with the async generator as the

audio

argument to play the audio. Example:

Copy code

import wave
from livekit import rtc

with wave.open('audio.wav', 'rb') as wav_file:
    num_channels = wav_file.getnchannels()
    sample_rate = wav_file.getframerate()
    frames = wav_file.readframes(wav_file.getnframes())

audio_frame = rtc.AudioFrame(
    data=frames,
    sample_rate=sample_rate,
    num_channels=num_channels,
    samples_per_channel=wav_file.getnframes(),
)

async def audio_generator():
    yield audio_frame

await session.say('Playing audio', audio=audio_generator())

• Data length must be at least

num_channels * samples_per_channel * sizeof(int16)

bytes. This will play the specified audio through the agent's voice pipeline. More details on the constructor and data requirements are available in the documentation. • Source: , Sources: livekit-rtc/livekit/rtc/audio_source.py | basics/playing_audio.py

able-jackal-84140

07/14/2025, 11:49 AM

I don't want to use session.say

tall-airline-16756

07/14/2025, 11:50 AM

Hey there! Not a fan of using

session.say()

? No worries - I've got some friendly alternatives that'll help you play or inject audio using

AudioFrame

llm_node

. Let me walk you through two super handy ways to do this: 1. Override
tts_node
: If you're working with pipeline models, you can totally customize your agent's

tts_node

method to create your own

AudioFrame

objects. Check out this quick example:

Copy code

async def tts_node(self, text, model_settings):
    # Yield your custom AudioFrame(s)
    yield your_audio_frame

1. Override
realtime_audio_output_node
: For realtime models, this method is your go-to for generating custom

AudioFrame

objects in your audio stream.

Copy code

async def realtime_audio_output_node(self, audio, model_settings):
    yield your_audio_frame

• These methods are super flexible - they let you have full control over injecting, modifying, or streaming audio frames right through your agent's pipeline, without touching

session.say()

. • Just remember to create your

AudioFrame

objects with the right data and metadata (like sample rate and channels). I've got some detailed examples in previous messages if you need a hand. • Pro tip: The

llm_node

typically handles text, while audio magic happens in

tts_node

(for text-to-speech) or

realtime_audio_output_node

(for live audio agents). Pick the override that best matches your model pipeline, and you'll be streaming audio like a pro in no time! • Source: , Sources: examples/avatar_agents/audio_wave/avatar_runner.py | home_assistant/homeautomation.py

able-jackal-84140

07/14/2025, 12:02 PM

I want to stream an audio directly to the user if a certain text is found.

tall-airline-16756

07/14/2025, 12:03 PM

Hey there! Let me walk you through a cool way to stream audio when you detect specific text in your agent. It's actually pretty straightforward, and I'll break it down into friendly steps. 1. Inside your

llm_node

method, you'll want to keep an eye out for your target phrase in the generated text. 2. When you spot that phrase, you can quickly generate some AudioFrame(s) – maybe through your

tts_node

or by pulling from a file or buffer. 3. Then just send those audio frames right away through your processing pipeline. Super easy! Here's a simple example to help you visualize:

Copy code

async def llm_node(self, ...):
    async for chunk in llm_stream:
        if isinstance(chunk, str) and "your phrase" in chunk:
            await self._generate_audio_response("Phrase detected!")
        yield chunk

async def _generate_audio_response(self, response_text):
    audio_stream = self.tts_node(lambda: (yield response_text), ModelSettings())
    async for audio_frame in audio_stream:
        await self._handle_audio_frame(audio_frame)

• Pro tip: You can totally swap out

tts_node

with your own AudioFrame sources if you want to play pre-recorded audio or get creative with synthesis. • How you actually stream to the user will depend on your specific audio output setup – just check your

realtime_audio_output_node

or audio pipeline config. • Want to see a full, detailed example? The source code has got you covered! • Source: Sources: tests/utils.py

3 Views

Open in Slack

Previous Next