how do i receive bytes of output audio from my TTS...
# ask-ai
c
how do i receive bytes of output audio from my TTS in livekit’s agent session?
t
Thanks for reaching out, Faraz. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To receive bytes of output audio from your TTS in a LiveKit agent session, the process typically involves the following steps: 1. The TTS service makes an API request to a provider (e.g., Neuphonic, Hume, Groq, Inworld, or Resemble). 2. The response contains audio data, often base64 encoded. 3. The audio data is decoded into bytes when needed. 4. The bytes are pushed to an
output_emitter
object using its
push()
method. 5. Finally, the emitter is flushed with
output_emitter.flush()
. For example, in the Neuphonic plugin:
Copy code
audio_bytes = base64.b64decode(parsed_message["data"]["audio"])
output_emitter.push(audio_bytes)
And in the Groq plugin:
Copy code
async for data, _ in resp.content.iter_chunks():
    output_emitter.push(data)
The
AudioEmitter
object handles the streaming of these audio bytes to the LiveKit agent session. Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-resemble/livekit/...
c
how does this work for the elevenlabs plugin? what about the cartesia plugin?
t
Hey there! Let me break down how the ElevenLabs and Cartesia plugins handle audio output in a super friendly way. 😊 For the ElevenLabs plugin, here's how the magic happens: 1. The plugin connects to the ElevenLabs API and receives audio data in base64 encoded format. Think of it like a digital translator for audio! 2. The audio data gets decoded into bytes using this neat little code: `if data.get("audio"): b64data = base64.b64decode(data["audio"]) output_emitter.push(b64data)`It's basically turning the encoded message into something our system can understand. 3. The decoded bytes are then sent to an
output_emitter
, which organizes the data into streaming-friendly audio frames. Pretty cool, right? Now, the Cartesia plugin has its own little twist: 1. In the ChunkedStream approach, audio data comes in chunks from the Cartesia API and gets pushed straight to the `output_emitter`: `async for data, _ in resp.content.iter_chunks(): output_emitter.push(data)`It's like receiving a puzzle piece by piece! 2. With the WebSocket method, it's similar but with an extra step - the base64 data gets decoded first: `if data.get("data"): b64data = base64.b64decode(data["data"]) output_emitter.push(b64data)`A quick translation before sending it on its way. The coolest part? In both plugins, the AudioEmitter is the behind-the-scenes hero, making sure those audio bytes stream smoothly to the LiveKit agent session. Technology is awesome, isn't it? 🎉 Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-elevenlabs/README... | GitHub::livekit::agents | livekit-plugins/livekit-plugins-cartesia/livekit/...