my TTS audio is distorted. Can you provide any gui...
# ask-ai
c
my TTS audio is distorted. Can you provide any guidance on how to resolve this? I am using eleven labs and the playback through the simli avatar is choppy
t
Thanks for reaching out, Aaron. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Distorted or choppy TTS audio with ElevenLabs playback through a Simli avatar can result from several factors. Please try the following: • Audio Format: Ensure ElevenLabs is set to
mp3_22050_32
encoding for compatibility. • Tokenization: Use auto_mode with
SentenceTokenizer
for smoother streaming. • Streaming Latency: Set
streaming_latency=3
(or
4
for higher tolerance to instability). • Chunk Schedule: Fine-tune
chunk_length_schedule
, e.g.,
[120, 160, 250, 290]
, to fit your network and use case. • Network Quality: Choppy audio can also arise from network congestion or packet loss. If possible, test under stable wired connections. For best results, combine sentence-based streaming (
auto_mode
and
SentenceTokenizer
) with optimized latency and chunk settings. Adjust voice and SSML settings if required for clarity. If issues persist, further details about your configuration or network may identify additional optimizations. Let me know if you need detailed steps for specific settings, or if you’re using any custom configuration. • Reference: Sources: tests/test_tts.py | Bringing Zoom's end-to-end optimizations to WebRTC | ElevenLabs TTS integration guide | LiveKit Docs
c
can you show an example configuration?
t
Hey there! Let me walk you through a Python configuration for ElevenLabs TTS that'll make your LiveKit avatar sound super smooth. I've got some awesome settings that'll help reduce those pesky audio hiccups.
Copy code
import elevenlabs
from elevenlabs import tokenize

tts = elevenlabs.TTS(
    encoding="mp3_22050_32",  # Optimized encoding and sample rate
    voice_id="your_voice_id", # Replace with your chosen ElevenLabs voice ID
    model="eleven_turbo_v2_5", # Recommended model
    auto_mode=True,  # Enables latency-optimized streaming
    word_tokenizer=tokenize.basic.SentenceTokenizer(), # Sentence-level chunking
    streaming_latency=3,  # Default optimized latency
    chunk_length_schedule=[120, 160, 250, 290], # Tuned for real-time stability
    inactivity_timeout=300,  # 5 minutes inactivity timeout
    enable_ssml_parsing=True,  # Enable for improved pronunciation control
    voice_settings=elevenlabs.VoiceSettings(
        stability=0.71,
        similarity_boost=0.5,
        style=0.0,
        use_speaker_boost=True
    )
)
Just swap out
your_voice_id
with the ElevenLabs voice you love. These settings are like a magic wand that tackles common playback gremlins by using smart sentence streaming, optimized timing, and audio encoding. Need more details about a specific part of your setup? I'm all ears – just ask! • Reference:
c
how would you update this
Copy code
from livekit.plugins.elevenlabs import tts as elevenlabs_tts
logger.debug("[Worker Agent] Initializing ElevenLabs TTS with model=eleven_turbo_v2_5...")
try:
    # Simplified configuration with minimal parameters for stability
    tts_plugin = elevenlabs_tts.TTS(
        model="eleven_turbo_v2_5",
        voice_id="gs0tAILXbY5DNrJrsM6F",
        auto_mode=1,
        word_tokenizer="SentenceTokenizer",
        voice_settings=None,  # Use default voice settings
        language="en",
        streaming_latency=3,   # Moderate latency
        api_key=os.getenv("ELEVEN_API_KEY", "")
    )
t
Hey there! I noticed your configuration is super close to being perfect for smooth, real-time playback. Let me walk you through some friendly tips to help you optimize your ElevenLabs integration: • auto_mode: Pro tip - use
auto_mode=True
(Python boolean) instead of
1
. It'll make your code a bit clearer! • word_tokenizer: If you can, try passing the actual SentenceTokenizer object. If the plugin only takes strings, just make sure it maps to a sentence-based tokenizer. • voice_settings: Want your audio to sound more natural? Consider specifying some custom voice settings (I've got a great example below) unless you're happy with the default ElevenLabs voice. • chunk_length_schedule: This little setting can be a game-changer for streaming performance, especially when your network is being a bit unpredictable. • encoding: For the smoothest avatar playback, try setting
encoding="mp3_22050_32"
if you can. • enable_ssml_parsing: If you want more control over how words are pronounced, definitely turn this on! Want to see how this might look all put together? Check out this config:
Copy code
from livekit.plugins.elevenlabs import tts as elevenlabs_tts
from elevenlabs import tokenize

tts_plugin = elevenlabs_tts.TTS(
    model="eleven_turbo_v2_5",
    voice_id="gs0tAILXbY5DNrJrsM6F",
    auto_mode=True,
    word_tokenizer=tokenize.basic.SentenceTokenizer(),
    voice_settings={  # Voice settings for increased clarity
        "stability": 0.71,
        "similarity_boost": 0.5,
        "style": 0.0,
        "use_speaker_boost": True
    },
    language="en",
    streaming_latency=3,
    chunk_length_schedule=[120, 160, 250, 290],
    encoding="mp3_22050_32",
    enable_ssml_parsing=True,
    api_key=os.getenv("ELEVEN_API_KEY", "")
)
Just a friendly reminder - double-check that each parameter works with your specific plugin version. If you hit any snags with parameters, take a moment to verify what arguments are available. This setup is all about getting that smooth, optimized ElevenLabs TTS experience for your avatars! • Reference: