Can I play a manual a custom audio within a custom tts node LiveKit Community #ask-ai

Join Slack

Can I play a manual a custom audio within a custom...

# ask-ai

able-jackal-84140

07/14/2025, 7:21 AM

Can I play a manual a custom audio within a custom tts_node. Currently its not being played

tall-airline-16756

07/14/2025, 7:21 AM

Thanks for reaching out, Aditya. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/14/2025, 7:23 AM

Are you currently attempting to play custom audio by yielding

AudioFrame

objects directly from your

tts_node

implementation, or are you using another method (such as

session.say()

with a custom audio generator)? Please describe your current approach in more detail, so I can provide specific troubleshooting steps or recommendations. Sources: Agent speech and audio | LiveKit Docs | Voice pipeline nodes | LiveKit Docs

able-jackal-84140

07/14/2025, 7:59 AM

I want to use AudioFrame

tall-airline-16756

07/14/2025, 8:00 AM

Hey there! Let me walk you through how to play custom audio using

AudioFrame

objects in your

tts_node

. I know this might sound technical, but I'll break it down like we're just chatting. • Your

tts_node

method needs to return an

AsyncIterable[rtc.AudioFrame]

or an async generator that creates

AudioFrame

objects - think of it like serving up audio slices. • Each

AudioFrame

is like an audio package that needs a few key details: ◦ data: Raw PCM bytes (the actual sound) ◦ sample_rate: The audio's frequency (like 16000, 24000, or 48000 Hz) ◦ num_channels: Whether it's mono (1) or stereo (2) ◦ samples_per_channel: How many audio samples are in each channel • Try to keep your frame sizes consistent - something like 100ms per frame works great. • Pro tip: Use

utils.audio.AudioByteStream

to help manage your audio buffering smoothly. • Don't worry about playback - the agent pipeline will handle sending your audio to room participants automatically! Here's a friendly example to help you visualize:

Copy code

async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
    async for text_chunk in text:
        # Generate PCM audio for the chunk, then yield properly configured AudioFrame(s)
        yield rtc.AudioFrame(
            data=pcm_bytes,
            sample_rate=16000,
            num_channels=1,
            samples_per_channel=1600  # for 100ms at 16kHz
        )

Just a heads up - your custom audio won't play if any

AudioFrame

details are off. The good news? The process is pretty straightforward. If you're still stuck, double-check your PCM data and frame settings. The

AudioByteStream

can be a real lifesaver for keeping things tidy! Sources: Agent speech and audio | LiveKit Docs | examples/voice_agents/speedup_output_audio.py | basics/playing_audio.py

able-jackal-84140

07/14/2025, 11:46 AM

I want to play a custom audio, how can I do that?

tall-airline-16756

07/14/2025, 11:46 AM

Hey there! Want to play a custom audio file or stream in your

tts_node

? No worries, I've got you covered! Here's a friendly walkthrough to help you get your audio playing smoothly. 1. First up, grab or create your audio as raw PCM bytes - think of it like preparing your favorite playlist. 2. Now, let's craft those

AudioFrame

objects with the right details: ◦ data (those PCM bytes you just prepared) ◦ sample_rate (like 16000 Hz - think of it as the audio's rhythm) ◦ num_channels (1 for solo, 2 if you want stereo vibes) ◦ samples_per_channel (e.g., 1600 for a crisp 100ms frame at 16kHz) 1. Simply yield each

AudioFrame

from your async generator in

tts_node

. Easy peasy! 2. The LiveKit agent pipeline will take care of playing these frames to everyone in the room. Magic, right? Pro tip: If something sounds off and the audio isn't playing, double-check those

AudioFrame

settings. Make sure your sample rate and other parameters are spot on, and your PCM data is looking good. Bonus hack: Need to handle big or streaming audio? Try

utils.audio.AudioByteStream

- it's like a helpful audio buffer buddy. Here's a quick example to get you started:

Copy code

async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
    # Generate or load your custom PCM audio
    yield rtc.AudioFrame(
        data=pcm_bytes,
        sample_rate=16000,
        num_channels=1,
        samples_per_channel=1600
    )

able-jackal-84140

07/14/2025, 11:47 AM

how do I load a custom audio but?

tall-airline-16756

07/14/2025, 11:48 AM

Hey there! Want to know a cool trick for loading custom audio files in your

tts_node

? It's actually pretty straightforward. You can read your WAV file and create

AudioFrame

objects from its PCM data. Let me walk you through an example in Python:

Copy code

import wave
from livekit import rtc

with wave.open("audio.wav", 'rb') as wav_file:
    num_channels = wav_file.getnchannels()
    sample_rate = wav_file.getframerate()
    frames = wav_file.readframes(wav_file.getnframes())

frame = rtc.AudioFrame(
    data=frames,
    sample_rate=sample_rate,
    num_channels=num_channels,
    samples_per_channel=wav_file.getnframes()
)

Just yield these

AudioFrame

objects inside your

tts_node

, and you're good to go! Pro tip: If you're working with larger files and want to stream them smoothly, use

audio_frames_from_file

from `livekit.agents.utils.audio`:

Copy code

from livekit.agents.utils.audio import audio_frames_from_file
async for frame in audio_frames_from_file("audio.wav", sample_rate=48000, num_channels=1):
    yield frame

This way, your custom audio will be streamed perfectly for playback. Easy peasy! Sources: agent_extensions/agent_extensions/utils/wav_player.py | examples/receive-audio/index.ts | examples/play_from_disk/src/main.rs | livekit-rtc/livekit/rtc/audio_frame.py | livekit-agents/livekit/agents/tts/tts.py | basics/playing_audio.py

Open in Slack

Previous Next