able-jackal-84140
07/11/2025, 7:55 AMtall-airline-16756
07/11/2025, 7:55 AMtall-airline-16756
07/11/2025, 7:56 AMsession.say()
method to play audio files. You can yield audio frames in an async generator and pass it to session.say()
. For example, load a WAV file and stream it as audio output during the function execution.
• BackgroundAudioPlayer: For more advanced playback (e.g., on-demand, looping), use the BackgroundAudioPlayer
class. It allows you to start background audio and trigger playback using the play()
method. You can play file paths, built-in clips, or even custom audio frame generators.
The play()
method gives you control over playback (you can stop or await completion). These approaches support playing audio as user feedback during function call processing, which is useful for keeping users engaged during delays.
• See detailed LiveKit implementation examples for both session.say()
and BackgroundAudioPlayer
in function tool scripts.
Let me know if you need code samples for a specific method or further details on integrating audio playback into your function calls.
Sources: tests/test_agent_session.py | Agent speech and audio | LiveKit Docs | delay-handling/README.mdoc | basics/playing_audio.pyable-jackal-84140
07/11/2025, 7:56 AMtall-airline-16756
07/11/2025, 7:57 AMrtc.AudioFrame
objects) to BackgroundAudioPlayer.play()
or session.say()
.
• Just make sure your audio frames are set up right: aim for 48kHz, 1 channel, and typically 16-bit PCM data. No worries if that sounds technical – it's simpler than it seems!
• Want continuous playback? Your iterator should keep those frames coming. Since loop=True
won't work for iterators, you'll need to build the looping into your iterator itself.
• Here's a quick example using `BackgroundAudioPlayer`:
async def my_audio_stream() -> AsyncIterator[rtc.AudioFrame]:
# yield custom rtc.AudioFrame here
...
background_player = BackgroundAudioPlayer()
play_handle = background_player.play(my_audio_stream())
await background_player.start(room=room)
• And here's how you'd use it with `session.say()`:
await session.say("text to speak", audio=my_audio_stream(), allow_interruptions=True)
Pro tip: If you're starting with raw bytes, just convert them to rtc.AudioFrame
objects before yielding them in your async iterator.
Need help creating a generator for a specific audio format or want to dive deeper? I'm all ears – just ask!
Sources: livekit-agents/livekit/agents/tts/tts.py | livekit-rtc/livekit/rtc/audio_mixer.py | Agent speech and audio | LiveKit Docsable-jackal-84140
07/11/2025, 8:01 AMtall-airline-16756
07/11/2025, 8:02 AMaudio_frames_from_file
utility – it's like magic for turning your audio files into streamable frames!
from livekit.agents.utils import audio_frames_from_file
from livekit import rtc
async def audio_generator(file_path: str):
async for frame in audio_frames_from_file(file_path, sample_rate=48000, num_channels=1):
yield frame
This friendly little audio_generator
takes your audio file and transforms it into rtc.AudioFrame
objects that are ready to roll. And the best part? It works with WAV, MP3, AAC – basically, most audio formats you can think of!
• With session.say():
await session.say("Playing audio", audio=audio_generator("/path/to/your.wav"))
• With BackgroundAudioPlayer:
background_player = BackgroundAudioPlayer()
background_player.play(audio_generator("/path/to/your.wav"))
await background_player.start(room=room)
No need to stress about manual audio decoding – audio_frames_from_file
has got your back! You can totally customize the sample_rate
and num_channels
to match your specific audio and playback needs.
Isn't that awesome?
Need help with a specific audio format or want a manual WAV loader example? Just give me a shout – I'm here to help!
Sources: livekit-agents/livekit/agents/voice/avatar/_datastream_io.py | basics/playing_audio.pyable-jackal-84140
07/11/2025, 8:05 AMtall-airline-16756
07/11/2025, 8:05 AMsession.say()
with custom audio. If you provide a custom audio stream, the text you include (like "Playing audio") won't actually be turned into spoken words.
Think of it like this: only the audio you specifically supply will be heard by users. The text-to-speech system takes a backseat completely. But don't worry - the text still plays a useful role in the background, helping with chat context and transcription details.
Quick summary: Users will hear only the audio you stream in, and the text is just hanging out to provide some helpful context. Cool, right?
Sources: basics/playing_audio.py | Agent speech and audio | LiveKit Docs | livekit-agents/livekit/agents/voice/agent_activity.py