how do i receive bytes of output audio from my TTS in liveki LiveKit Community #ask-ai

Join Slack

how do i receive bytes of output audio from my TTS...

# ask-ai

crooked-tiger-31253

06/27/2025, 11:48 PM

how do i receive bytes of output audio from my TTS in livekit’s agent session?

tall-airline-16756

06/27/2025, 11:48 PM

Thanks for reaching out, Faraz. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

06/27/2025, 11:48 PM

To receive bytes of output audio from your TTS in a LiveKit agent session, the process typically involves the following steps: 1. The TTS service makes an API request to a provider (e.g., Neuphonic, Hume, Groq, Inworld, or Resemble). 2. The response contains audio data, often base64 encoded. 3. The audio data is decoded into bytes when needed. 4. The bytes are pushed to an

output_emitter

object using its

push()

method. 5. Finally, the emitter is flushed with

output_emitter.flush()

. For example, in the Neuphonic plugin:

Copy code

audio_bytes = base64.b64decode(parsed_message["data"]["audio"])
output_emitter.push(audio_bytes)

And in the Groq plugin:

Copy code

async for data, _ in resp.content.iter_chunks():
    output_emitter.push(data)

The

AudioEmitter

object handles the streaming of these audio bytes to the LiveKit agent session. Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-resemble/livekit/...

crooked-tiger-31253

06/28/2025, 12:04 AM

how does this work for the elevenlabs plugin? what about the cartesia plugin?

tall-airline-16756

06/28/2025, 12:05 AM

Hey there! Let me break down how the ElevenLabs and Cartesia plugins handle audio output in a super friendly way. 😊 For the ElevenLabs plugin, here's how the magic happens: 1. The plugin connects to the ElevenLabs API and receives audio data in base64 encoded format. Think of it like a digital translator for audio! 2. The audio data gets decoded into bytes using this neat little code: `if data.get("audio"): b64data = base64.b64decode(data["audio"]) output_emitter.push(b64data)`It's basically turning the encoded message into something our system can understand. 3. The decoded bytes are then sent to an

output_emitter

, which organizes the data into streaming-friendly audio frames. Pretty cool, right? Now, the Cartesia plugin has its own little twist: 1. In the ChunkedStream approach, audio data comes in chunks from the Cartesia API and gets pushed straight to the `output_emitter`: `async for data, _ in resp.content.iter_chunks(): output_emitter.push(data)`It's like receiving a puzzle piece by piece! 2. With the WebSocket method, it's similar but with an extra step - the base64 data gets decoded first: `if data.get("data"): b64data = base64.b64decode(data["data"]) output_emitter.push(b64data)`A quick translation before sending it on its way. The coolest part? In both plugins, the AudioEmitter is the behind-the-scenes hero, making sure those audio bytes stream smoothly to the LiveKit agent session. Technology is awesome, isn't it? 🎉 Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-elevenlabs/README... | GitHub::livekit::agents | livekit-plugins/livekit-plugins-cartesia/livekit/...

Open in Slack

Previous Next