chilly-soccer-432
07/10/2025, 2:49 PMtall-airline-16756
07/10/2025, 2:50 PMtall-airline-16756
07/10/2025, 2:50 PMmp3_22050_32
encoding for compatibility.
• Tokenization: Use auto_mode with SentenceTokenizer
for smoother streaming.
• Streaming Latency: Set streaming_latency=3
(or 4
for higher tolerance to instability).
• Chunk Schedule: Fine-tune chunk_length_schedule
, e.g., [120, 160, 250, 290]
, to fit your network and use case.
• Network Quality: Choppy audio can also arise from network congestion or packet loss. If possible, test under stable wired connections.
For best results, combine sentence-based streaming (auto_mode
and SentenceTokenizer
) with optimized latency and chunk settings. Adjust voice and SSML settings if required for clarity. If issues persist, further details about your configuration or network may identify additional optimizations.
Let me know if you need detailed steps for specific settings, or if you’re using any custom configuration.
• Reference: Sources: tests/test_tts.py | Bringing Zoom's end-to-end optimizations to WebRTC | ElevenLabs TTS integration guide | LiveKit Docschilly-soccer-432
07/10/2025, 3:33 PMtall-airline-16756
07/10/2025, 3:34 PMimport elevenlabs
from elevenlabs import tokenize
tts = elevenlabs.TTS(
encoding="mp3_22050_32", # Optimized encoding and sample rate
voice_id="your_voice_id", # Replace with your chosen ElevenLabs voice ID
model="eleven_turbo_v2_5", # Recommended model
auto_mode=True, # Enables latency-optimized streaming
word_tokenizer=tokenize.basic.SentenceTokenizer(), # Sentence-level chunking
streaming_latency=3, # Default optimized latency
chunk_length_schedule=[120, 160, 250, 290], # Tuned for real-time stability
inactivity_timeout=300, # 5 minutes inactivity timeout
enable_ssml_parsing=True, # Enable for improved pronunciation control
voice_settings=elevenlabs.VoiceSettings(
stability=0.71,
similarity_boost=0.5,
style=0.0,
use_speaker_boost=True
)
)
Just swap out your_voice_id
with the ElevenLabs voice you love. These settings are like a magic wand that tackles common playback gremlins by using smart sentence streaming, optimized timing, and audio encoding.
Need more details about a specific part of your setup? I'm all ears – just ask!
• Reference:chilly-soccer-432
07/10/2025, 3:36 PMfrom livekit.plugins.elevenlabs import tts as elevenlabs_tts
logger.debug("[Worker Agent] Initializing ElevenLabs TTS with model=eleven_turbo_v2_5...")
try:
# Simplified configuration with minimal parameters for stability
tts_plugin = elevenlabs_tts.TTS(
model="eleven_turbo_v2_5",
voice_id="gs0tAILXbY5DNrJrsM6F",
auto_mode=1,
word_tokenizer="SentenceTokenizer",
voice_settings=None, # Use default voice settings
language="en",
streaming_latency=3, # Moderate latency
api_key=os.getenv("ELEVEN_API_KEY", "")
)
tall-airline-16756
07/10/2025, 3:37 PMauto_mode=True
(Python boolean) instead of 1
. It'll make your code a bit clearer!
• word_tokenizer: If you can, try passing the actual SentenceTokenizer object. If the plugin only takes strings, just make sure it maps to a sentence-based tokenizer.
• voice_settings: Want your audio to sound more natural? Consider specifying some custom voice settings (I've got a great example below) unless you're happy with the default ElevenLabs voice.
• chunk_length_schedule: This little setting can be a game-changer for streaming performance, especially when your network is being a bit unpredictable.
• encoding: For the smoothest avatar playback, try setting encoding="mp3_22050_32"
if you can.
• enable_ssml_parsing: If you want more control over how words are pronounced, definitely turn this on!
Want to see how this might look all put together? Check out this config:
from livekit.plugins.elevenlabs import tts as elevenlabs_tts
from elevenlabs import tokenize
tts_plugin = elevenlabs_tts.TTS(
model="eleven_turbo_v2_5",
voice_id="gs0tAILXbY5DNrJrsM6F",
auto_mode=True,
word_tokenizer=tokenize.basic.SentenceTokenizer(),
voice_settings={ # Voice settings for increased clarity
"stability": 0.71,
"similarity_boost": 0.5,
"style": 0.0,
"use_speaker_boost": True
},
language="en",
streaming_latency=3,
chunk_length_schedule=[120, 160, 250, 290],
encoding="mp3_22050_32",
enable_ssml_parsing=True,
api_key=os.getenv("ELEVEN_API_KEY", "")
)
Just a friendly reminder - double-check that each parameter works with your specific plugin version. If you hit any snags with parameters, take a moment to verify what arguments are available. This setup is all about getting that smooth, optimized ElevenLabs TTS experience for your avatars!
• Reference: