Can I play a manual a custom audio within a custom...
# ask-ai
a
Can I play a manual a custom audio within a custom tts_node. Currently its not being played
t
Thanks for reaching out, Aditya. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Are you currently attempting to play custom audio by yielding
AudioFrame
objects directly from your
tts_node
implementation, or are you using another method (such as
session.say()
with a custom audio generator)? Please describe your current approach in more detail, so I can provide specific troubleshooting steps or recommendations. Sources: Agent speech and audio | LiveKit Docs | Voice pipeline nodes | LiveKit Docs
a
I want to use AudioFrame
t
Hey there! Let me walk you through how to play custom audio using
AudioFrame
objects in your
tts_node
. I know this might sound technical, but I'll break it down like we're just chatting. • Your
tts_node
method needs to return an
AsyncIterable[rtc.AudioFrame]
or an async generator that creates
AudioFrame
objects - think of it like serving up audio slices. • Each
AudioFrame
is like an audio package that needs a few key details: ◦ data: Raw PCM bytes (the actual sound) ◦ sample_rate: The audio's frequency (like 16000, 24000, or 48000 Hz) ◦ num_channels: Whether it's mono (1) or stereo (2) ◦ samples_per_channel: How many audio samples are in each channel • Try to keep your frame sizes consistent - something like 100ms per frame works great. • Pro tip: Use
utils.audio.AudioByteStream
to help manage your audio buffering smoothly. • Don't worry about playback - the agent pipeline will handle sending your audio to room participants automatically! Here's a friendly example to help you visualize:
Copy code
async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
    async for text_chunk in text:
        # Generate PCM audio for the chunk, then yield properly configured AudioFrame(s)
        yield rtc.AudioFrame(
            data=pcm_bytes,
            sample_rate=16000,
            num_channels=1,
            samples_per_channel=1600  # for 100ms at 16kHz
        )
Just a heads up - your custom audio won't play if any
AudioFrame
details are off. The good news? The process is pretty straightforward. If you're still stuck, double-check your PCM data and frame settings. The
AudioByteStream
can be a real lifesaver for keeping things tidy! Sources: Agent speech and audio | LiveKit Docs | examples/voice_agents/speedup_output_audio.py | basics/playing_audio.py
a
I want to play a custom audio, how can I do that?
t
Hey there! Want to play a custom audio file or stream in your
tts_node
? No worries, I've got you covered! Here's a friendly walkthrough to help you get your audio playing smoothly. 1. First up, grab or create your audio as raw PCM bytes - think of it like preparing your favorite playlist. 2. Now, let's craft those
AudioFrame
objects with the right details: ◦ data (those PCM bytes you just prepared) ◦ sample_rate (like 16000 Hz - think of it as the audio's rhythm) ◦ num_channels (1 for solo, 2 if you want stereo vibes) ◦ samples_per_channel (e.g., 1600 for a crisp 100ms frame at 16kHz) 1. Simply yield each
AudioFrame
from your async generator in
tts_node
. Easy peasy! 2. The LiveKit agent pipeline will take care of playing these frames to everyone in the room. Magic, right? Pro tip: If something sounds off and the audio isn't playing, double-check those
AudioFrame
settings. Make sure your sample rate and other parameters are spot on, and your PCM data is looking good. Bonus hack: Need to handle big or streaming audio? Try
utils.audio.AudioByteStream
- it's like a helpful audio buffer buddy. Here's a quick example to get you started:
Copy code
async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
    # Generate or load your custom PCM audio
    yield rtc.AudioFrame(
        data=pcm_bytes,
        sample_rate=16000,
        num_channels=1,
        samples_per_channel=1600
    )
a
how do I load a custom audio but?
t
Hey there! Want to know a cool trick for loading custom audio files in your
tts_node
? It's actually pretty straightforward. You can read your WAV file and create
AudioFrame
objects from its PCM data. Let me walk you through an example in Python:
Copy code
import wave
from livekit import rtc

with wave.open("audio.wav", 'rb') as wav_file:
    num_channels = wav_file.getnchannels()
    sample_rate = wav_file.getframerate()
    frames = wav_file.readframes(wav_file.getnframes())

frame = rtc.AudioFrame(
    data=frames,
    sample_rate=sample_rate,
    num_channels=num_channels,
    samples_per_channel=wav_file.getnframes()
)
Just yield these
AudioFrame
objects inside your
tts_node
, and you're good to go! Pro tip: If you're working with larger files and want to stream them smoothly, use
audio_frames_from_file
from `livekit.agents.utils.audio`:
Copy code
from livekit.agents.utils.audio import audio_frames_from_file
async for frame in audio_frames_from_file("audio.wav", sample_rate=48000, num_channels=1):
    yield frame
This way, your custom audio will be streamed perfectly for playback. Easy peasy! Sources: agent_extensions/agent_extensions/utils/wav_player.py | examples/receive-audio/index.ts | examples/play_from_disk/src/main.rs | livekit-rtc/livekit/rtc/audio_frame.py | livekit-agents/livekit/agents/tts/tts.py | basics/playing_audio.py