able-jackal-84140
07/14/2025, 7:21 AMtall-airline-16756
07/14/2025, 7:21 AMtall-airline-16756
07/14/2025, 7:23 AMAudioFrame
objects directly from your tts_node
implementation, or are you using another method (such as session.say()
with a custom audio generator)? Please describe your current approach in more detail, so I can provide specific troubleshooting steps or recommendations.
Sources: Agent speech and audio | LiveKit Docs | Voice pipeline nodes | LiveKit Docsable-jackal-84140
07/14/2025, 7:59 AMtall-airline-16756
07/14/2025, 8:00 AMAudioFrame
objects in your tts_node
. I know this might sound technical, but I'll break it down like we're just chatting.
• Your tts_node
method needs to return an AsyncIterable[rtc.AudioFrame]
or an async generator that creates AudioFrame
objects - think of it like serving up audio slices.
• Each AudioFrame
is like an audio package that needs a few key details:
◦ data: Raw PCM bytes (the actual sound)
◦ sample_rate: The audio's frequency (like 16000, 24000, or 48000 Hz)
◦ num_channels: Whether it's mono (1) or stereo (2)
◦ samples_per_channel: How many audio samples are in each channel
• Try to keep your frame sizes consistent - something like 100ms per frame works great.
• Pro tip: Use utils.audio.AudioByteStream
to help manage your audio buffering smoothly.
• Don't worry about playback - the agent pipeline will handle sending your audio to room participants automatically!
Here's a friendly example to help you visualize:
async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
async for text_chunk in text:
# Generate PCM audio for the chunk, then yield properly configured AudioFrame(s)
yield rtc.AudioFrame(
data=pcm_bytes,
sample_rate=16000,
num_channels=1,
samples_per_channel=1600 # for 100ms at 16kHz
)
Just a heads up - your custom audio won't play if any AudioFrame
details are off. The good news? The process is pretty straightforward. If you're still stuck, double-check your PCM data and frame settings. The AudioByteStream
can be a real lifesaver for keeping things tidy!
Sources: Agent speech and audio | LiveKit Docs | examples/voice_agents/speedup_output_audio.py | basics/playing_audio.pyable-jackal-84140
07/14/2025, 11:46 AMtall-airline-16756
07/14/2025, 11:46 AMtts_node
? No worries, I've got you covered! Here's a friendly walkthrough to help you get your audio playing smoothly.
1. First up, grab or create your audio as raw PCM bytes - think of it like preparing your favorite playlist.
2. Now, let's craft those AudioFrame
objects with the right details:
◦ data (those PCM bytes you just prepared)
◦ sample_rate (like 16000 Hz - think of it as the audio's rhythm)
◦ num_channels (1 for solo, 2 if you want stereo vibes)
◦ samples_per_channel (e.g., 1600 for a crisp 100ms frame at 16kHz)
1. Simply yield each AudioFrame
from your async generator in tts_node
. Easy peasy!
2. The LiveKit agent pipeline will take care of playing these frames to everyone in the room. Magic, right?
Pro tip: If something sounds off and the audio isn't playing, double-check those AudioFrame
settings. Make sure your sample rate and other parameters are spot on, and your PCM data is looking good.
Bonus hack: Need to handle big or streaming audio? Try utils.audio.AudioByteStream
- it's like a helpful audio buffer buddy.
Here's a quick example to get you started:
async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
# Generate or load your custom PCM audio
yield rtc.AudioFrame(
data=pcm_bytes,
sample_rate=16000,
num_channels=1,
samples_per_channel=1600
)
able-jackal-84140
07/14/2025, 11:47 AMtall-airline-16756
07/14/2025, 11:48 AMtts_node
? It's actually pretty straightforward. You can read your WAV file and create AudioFrame
objects from its PCM data. Let me walk you through an example in Python:
import wave
from livekit import rtc
with wave.open("audio.wav", 'rb') as wav_file:
num_channels = wav_file.getnchannels()
sample_rate = wav_file.getframerate()
frames = wav_file.readframes(wav_file.getnframes())
frame = rtc.AudioFrame(
data=frames,
sample_rate=sample_rate,
num_channels=num_channels,
samples_per_channel=wav_file.getnframes()
)
Just yield these AudioFrame
objects inside your tts_node
, and you're good to go!
Pro tip: If you're working with larger files and want to stream them smoothly, use audio_frames_from_file
from `livekit.agents.utils.audio`:
from livekit.agents.utils.audio import audio_frames_from_file
async for frame in audio_frames_from_file("audio.wav", sample_rate=48000, num_channels=1):
yield frame
This way, your custom audio will be streamed perfectly for playback. Easy peasy!
Sources: agent_extensions/agent_extensions/utils/wav_player.py | examples/receive-audio/index.ts | examples/play_from_disk/src/main.rs | livekit-rtc/livekit/rtc/audio_frame.py | livekit-agents/livekit/agents/tts/tts.py | basics/playing_audio.py