We're implementing voice cloning with Cartesia TTS...
# ask-ai
b
We're implementing voice cloning with Cartesia TTS in a LiveKit Python agent. When we pass a custom voice ID (like voice_1754415960525_jye0sqx9d) to cartesia.TTS(voice=voice_id), we get errors: 1. APIError: no audio frames were pushed for text 2. RuntimeError: BufferedSentenceStream is closed The agent connects successfully, STT works, and LLM generates responses, but TTS fails. How should we properly handle custom/cloned voice IDs with Cartesia? Should we use a different TTS provider for cloned voices, or is there a specific format/validation needed for Cartesia voice IDs?
t
Thanks for reaching out, Andrew. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To handle custom or cloned voice IDs with Cartesia TTS in your LiveKit Python agent, follow these steps: • Verify Voice ID: Ensure the custom voice ID exists in your Cartesia account. You can use the Cartesia API to list available voices and confirm the ID. • Initialization: Use the correct model (e.g.,
sonic-2-2025-03-07
) when initializing the TTS instance with a custom voice ID. Example:
tts = cartesia.TTS(voice="your_custom_voice_id", model="sonic-2-2025-03-07")
Error Handling: For the errors you mentioned: ◦ APIError: no audio frames were pushed for text: This occurs when no audio is generated. Verify the voice ID, ensure the text is valid, and check API quotas. ◦ RuntimeError: BufferedSentenceStream is closed: This happens if the stream is improperly managed. Ensure proper stream lifecycle management by closing and recreating streams as needed. • Dynamic Updates: Use
update_options
to change the voice dynamically during runtime if needed. If these steps do not resolve the issue, let me know, and we can explore further troubleshooting or alternative solutions. Sources: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.p...