my TTS audio is distorted Can you provide any guidance on ho LiveKit Community #ask-ai

my TTS audio is distorted. Can you provide any gui...

chilly-soccer-432

07/10/2025, 2:49 PM

my TTS audio is distorted. Can you provide any guidance on how to resolve this? I am using eleven labs and the playback through the simli avatar is choppy

tall-airline-16756

07/10/2025, 2:50 PM

Thanks for reaching out, Aaron. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/10/2025, 2:50 PM

Distorted or choppy TTS audio with ElevenLabs playback through a Simli avatar can result from several factors. Please try the following: • Audio Format: Ensure ElevenLabs is set to

mp3_22050_32

encoding for compatibility. • Tokenization: Use auto_mode with

SentenceTokenizer

for smoother streaming. • Streaming Latency: Set

streaming_latency=3

(or

for higher tolerance to instability). • Chunk Schedule: Fine-tune

chunk_length_schedule

, e.g.,

[120, 160, 250, 290]

, to fit your network and use case. • Network Quality: Choppy audio can also arise from network congestion or packet loss. If possible, test under stable wired connections. For best results, combine sentence-based streaming (

auto_mode

and

SentenceTokenizer

) with optimized latency and chunk settings. Adjust voice and SSML settings if required for clarity. If issues persist, further details about your configuration or network may identify additional optimizations. Let me know if you need detailed steps for specific settings, or if you’re using any custom configuration. • Reference: Sources: tests/test_tts.py | Bringing Zoom's end-to-end optimizations to WebRTC | ElevenLabs TTS integration guide | LiveKit Docs

chilly-soccer-432

07/10/2025, 3:33 PM

can you show an example configuration?

tall-airline-16756

07/10/2025, 3:34 PM

Hey there! Let me walk you through a Python configuration for ElevenLabs TTS that'll make your LiveKit avatar sound super smooth. I've got some awesome settings that'll help reduce those pesky audio hiccups.

Copy code

import elevenlabs
from elevenlabs import tokenize

tts = elevenlabs.TTS(
    encoding="mp3_22050_32",  # Optimized encoding and sample rate
    voice_id="your_voice_id", # Replace with your chosen ElevenLabs voice ID
    model="eleven_turbo_v2_5", # Recommended model
    auto_mode=True,  # Enables latency-optimized streaming
    word_tokenizer=tokenize.basic.SentenceTokenizer(), # Sentence-level chunking
    streaming_latency=3,  # Default optimized latency
    chunk_length_schedule=[120, 160, 250, 290], # Tuned for real-time stability
    inactivity_timeout=300,  # 5 minutes inactivity timeout
    enable_ssml_parsing=True,  # Enable for improved pronunciation control
    voice_settings=elevenlabs.VoiceSettings(
        stability=0.71,
        similarity_boost=0.5,
        style=0.0,
        use_speaker_boost=True
    )
)

Just swap out

your_voice_id

with the ElevenLabs voice you love. These settings are like a magic wand that tackles common playback gremlins by using smart sentence streaming, optimized timing, and audio encoding. Need more details about a specific part of your setup? I'm all ears – just ask! • Reference:

chilly-soccer-432

07/10/2025, 3:36 PM

how would you update this

Copy code

from livekit.plugins.elevenlabs import tts as elevenlabs_tts
logger.debug("[Worker Agent] Initializing ElevenLabs TTS with model=eleven_turbo_v2_5...")
try:
    # Simplified configuration with minimal parameters for stability
    tts_plugin = elevenlabs_tts.TTS(
        model="eleven_turbo_v2_5",
        voice_id="gs0tAILXbY5DNrJrsM6F",
        auto_mode=1,
        word_tokenizer="SentenceTokenizer",
        voice_settings=None,  # Use default voice settings
        language="en",
        streaming_latency=3,   # Moderate latency
        api_key=os.getenv("ELEVEN_API_KEY", "")
    )

tall-airline-16756

07/10/2025, 3:37 PM

Hey there! I noticed your configuration is super close to being perfect for smooth, real-time playback. Let me walk you through some friendly tips to help you optimize your ElevenLabs integration: • auto_mode: Pro tip - use

auto_mode=True

(Python boolean) instead of

. It'll make your code a bit clearer! • word_tokenizer: If you can, try passing the actual SentenceTokenizer object. If the plugin only takes strings, just make sure it maps to a sentence-based tokenizer. • voice_settings: Want your audio to sound more natural? Consider specifying some custom voice settings (I've got a great example below) unless you're happy with the default ElevenLabs voice. • chunk_length_schedule: This little setting can be a game-changer for streaming performance, especially when your network is being a bit unpredictable. • encoding: For the smoothest avatar playback, try setting

encoding="mp3_22050_32"

if you can. • enable_ssml_parsing: If you want more control over how words are pronounced, definitely turn this on! Want to see how this might look all put together? Check out this config:

Copy code

from livekit.plugins.elevenlabs import tts as elevenlabs_tts
from elevenlabs import tokenize

tts_plugin = elevenlabs_tts.TTS(
    model="eleven_turbo_v2_5",
    voice_id="gs0tAILXbY5DNrJrsM6F",
    auto_mode=True,
    word_tokenizer=tokenize.basic.SentenceTokenizer(),
    voice_settings={  # Voice settings for increased clarity
        "stability": 0.71,
        "similarity_boost": 0.5,
        "style": 0.0,
        "use_speaker_boost": True
    },
    language="en",
    streaming_latency=3,
    chunk_length_schedule=[120, 160, 250, 290],
    encoding="mp3_22050_32",
    enable_ssml_parsing=True,
    api_key=os.getenv("ELEVEN_API_KEY", "")
)

Just a friendly reminder - double-check that each parameter works with your specific plugin version. If you hit any snags with parameters, take a moment to verify what arguments are available. This setup is all about getting that smooth, optimized ElevenLabs TTS experience for your avatars! • Reference:

2 Views

Open in Slack

Previous Next