nice-fish-21757
07/14/2025, 2:14 PMtall-airline-16756
07/14/2025, 2:14 PMtall-airline-16756
07/14/2025, 2:16 PMmin_sentence_len
(minimum sentence length in characters, default ~20) and stream_context_len
(context length, default ~10).
• TTS Provider options: Many providers (e.g., ElevenLabs, Google, Cartesia, Resemble) specify default buffered word counts, chunk length schedules, or allow you to pass a custom tokenizer object with your chosen settings. For instance, with ElevenLabs, you can set chunk_length_schedule
and auto_mode
(reduces buffering for lower latency).
• Audio buffer size: If you encounter frame capture or latency issues, audio source configurations let you set queue_size_ms
(recommendation: 1000-2000 ms for stability). Larger buffers reduce risk of frame errors but increase latency.
Configuration is typically done via constructor parameters. Example for a sentence tokenizer:
tokenizer = SentenceTokenizer(
min_sentence_len=25,
stream_context_len=15
)
tts = SomeTTS(tokenizer=tokenizer)
Or with ElevenLabs:
tts = ElevenLabsTTS(
chunk_length_schedule=[80, 120, 200, 260],
auto_mode=True
)
The right balance of buffer size and minimum sentence length depends on your latency/performance preferences. Let me know which provider you're using or if you need a code example for a specific environment.
Sources: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.p... | SentenceTokenizer | LiveKit Agents | ElevenLabs integration guide | LiveKit Docs | Speech & Audio Processing | livekit/livekit_composite | DeepWiki