Is it possible to only send a few sentences at a time to the LiveKit Community #ask-ai

Is it possible to only send a few sentences at a t...

red-accountant-48215

08/11/2025, 10:59 AM

Is it possible to only send a few sentences at a time to the TTS provider? I want the LLM to generate a long script to be read out, but then send it to TTS in smaller chunks

tall-airline-16756

08/11/2025, 10:59 AM

Thanks for reaching out, John. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/11/2025, 11:00 AM

Hey there! Yes, LiveKit can definitely send text to TTS providers in smaller chunks instead of all at once. This is handled by the

StreamPacer

class, which smartly buffers sentences and sends them in manageable chunks based on things like audio duration and text length. Here’s how you can do it: • Split your text into chunks (like 2-4 sentences) before sending them to the TTS provider. • Use methods like

push_text

flush

, and

end_input

to keep the flow smooth and controlled. Here’s a quick example:

Copy code

chunks = _text_to_chunks(streamed_text)
for chunk in chunks:
    stream.push_text(chunk)
stream.flush()
stream.end_input()

If you want, I’m here to help with any part of this—just let me know! Sources: livekit-plugins/livekit-plugins-lmnt/livekit/plugins/lmnt/tts.py | plugins/test/src/tts.ts

red-accountant-48215

08/11/2025, 11:01 AM

this is my current tts_node

Copy code

def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
        pronunciation_replacements = {
            "reveri": "reverie",
        }

        replaced_text = tokenize.utils.replace_words(text=text, replacements=pronunciation_replacements)
        return Agent.default.tts_node(self, text=replaced_text, model_settings=model_settings)

tall-airline-16756

08/11/2025, 11:02 AM

If you'd like to send text in smaller chunks to the TTS provider, you can tweak your

tts_node

method to handle and yield text bit by bit. Here's a simple example to get you started:

Copy code

async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
    max_chunk_size = 100  # Example chunk size

    async def process_text():
        current_chunk = []
        current_length = 0

        async for chunk in text:
            current_chunk.append(chunk)
            current_length += len(chunk)

            if current_length >= max_chunk_size or chunk.endswith(('.', '!', '?')):
                yield ' '.join(current_chunk)
                current_chunk = []
                current_length = 0

        if current_chunk:
            yield ' '.join(current_chunk)

    return Agent.default.tts_node(self, process_text(), model_settings)

This way, the text gets split into manageable chunks (like around 100 characters or at sentence breaks) and sent to the TTS provider step by step. If you want me to walk you through it or help with anything else, just let me know!

red-accountant-48215

08/11/2025, 11:03 AM

that seems to completely break the prose and ssml parsing

tall-airline-16756

08/11/2025, 11:05 AM

When working with SSML parsing and breaking up text for TTS, here are some handy tips to keep your prose sounding natural and smooth: • Turn on SSML Parsing: Just set

enable_ssml_parsing=True

in your TTS settings so those SSML tags get handled properly. • Go for Sentence-Based Tokenization: Instead of chopping text word by word, try using sentence tokenizers like

tokenize.blingfire.SentenceTokenizer()

to keep the context together. • Adjust Chunk Lengths: Use

chunk_length_schedule

to set chunk sizes—something like

[80, 120, 200, 260]

works well to keep the speech flowing naturally. Here’s a quick example to show you what I mean:

Copy code

tts = elevenlabs.TTS(
    enable_ssml_parsing=True,
    word_tokenizer=tokenize.blingfire.SentenceTokenizer(),
    chunk_length_schedule=[80, 120, 200, 260],
    voice_id="your-voice-id",
    model="eleven_turbo_v2_5"
)

This setup helps keep your SSML tags intact and makes sure the speech sounds just right. If you want to dive deeper or have any questions, I’m here to help! Sources: ElevenLabs TTS integration guide | LiveKit Docs | livekit-agents/livekit/agents/voice/transcription/filters.py

red-accountant-48215

08/11/2025, 11:41 AM

that doesn't work, would elevenlabs have any issues if there are lots of ssml breaks sent in the stream?

tall-airline-16756

08/11/2025, 11:41 AM

Hey there! Just a quick heads-up about ElevenLabs TTS and SSML parsing. It works great when enabled, but if SSML tags are incomplete or not formatted quite right, it can cause some hiccups in the streamed audio. Also, splitting SSML tags across different chunks might interrupt the flow of the audio stream. To keep things running smoothly, here are a few tips: • Make sure all your SSML tags are properly closed and formatted. • Keep an eye on chunk lengths so you don’t accidentally split SSML tags across chunks. • Be prepared for cases where incomplete XML might pop up at the end of a stream. If you want, I’m here to help you with these steps or chat more about how ElevenLabs TTS handles SSML. Just let me know! Sources: tests/test_tts.py | plugins/elevenlabs/src/tts.test.ts | ElevenLabs integration guide | LiveKit Docs

red-accountant-48215

08/11/2025, 11:48 AM

how can i do that if I don't know what the llm will send back?

tall-airline-16756

08/11/2025, 11:48 AM

When working with unknown LLM output for SSML parsing and chunking in ElevenLabs TTS, a helpful approach is to preprocess the text before sending it to the TTS engine. Here’s a friendly guide on how to do that: • Enable SSML Parsing: Just set

enable_ssml_parsing=True

in your TTS configuration so it can understand SSML tags properly. • Define Chunking Strategy: Use

chunk_length_schedule

to decide how to split the text, like

[80, 120, 200, 260]

— this helps keep things smooth. • Preprocess LLM Output: You can create a callback function to clean up or tweak the text before it gets to the TTS engine. For instance, removing any unwanted tokens or fixing malformed SSML tags. Here’s a simple example of such a callback:

Copy code

async def before_tts_callback(text: str | AsyncIterable[str]):
    if isinstance(text, str):
        return text.replace("unknown_tag", "")
    else:
        async def stream_processor():
            async for chunk in text:
                yield chunk.replace("unknown_tag", "")
        return stream_processor()

This little step helps make sure any unexpected LLM outputs are handled smoothly. If you want to chat more or need a hand with anything else, just let me know! Sources: ElevenLabs TTS integration guide | LiveKit Docs | Voice pipeline nodes | LiveKit Docs

red-accountant-48215

08/11/2025, 11:50 AM

I'm using auto_mode = True (the only way i can get it not to crash) so my chunk_length_schedule is disabled.

tall-airline-16756

08/11/2025, 11:51 AM

When you’re using

auto_mode=True

, the

chunk_length_schedule

gets disabled, but don’t worry—you can still manage SSML parsing and chunking smoothly: • Enable SSML Parsing: Just set

enable_ssml_parsing=True

to make sure SSML tags are processed properly. • Use SentenceTokenizer: Swap out the default tokenizer for

SentenceTokenizer

. This helps handle full sentences or phrases nicely, especially with

auto_mode

on. Here’s a quick example to get you started:

Copy code

tts = elevenlabs.TTS(
    enable_ssml_parsing=True,
    word_tokenizer=tokenize.SentenceTokenizer(),
    auto_mode=True
)

This way, you get SSML parsing and natural chunking without needing

chunk_length_schedule

. If you have any questions or want to chat more about it, just let me know—I’m here to help!

red-accountant-48215

08/11/2025, 11:56 AM

using the sentenceTokenizer increases the number of hallucincations elevenlabs is getting

tall-airline-16756

08/11/2025, 11:57 AM

Hey there! I totally get how tricky it can be dealing with hallucinations in ElevenLabs TTS, especially when using

SentenceTokenizer

auto_mode

. While there’s no specific magic fix yet, here are a few friendly tips you might find helpful: • Try preprocessing your text to clear out any ambiguous or super complex phrases before sending it over to the TTS engine. • Play around with different tokenization methods or even turn off

auto_mode

to have more control over how the text gets chunked. • Consider adding SSML tags to guide pronunciation or add emphasis where it’s needed. If you’re still running into issues, just give me a shout—I’m here to help and can definitely pass this along for a deeper look.

2 Views

Open in Slack

Previous Next