Is it possible to only send a few sentences at a t...
# ask-ai
r
Is it possible to only send a few sentences at a time to the TTS provider? I want the LLM to generate a long script to be read out, but then send it to TTS in smaller chunks
t
Thanks for reaching out, John. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! Yes, LiveKit can definitely send text to TTS providers in smaller chunks instead of all at once. This is handled by the
StreamPacer
class, which smartly buffers sentences and sends them in manageable chunks based on things like audio duration and text length. Here’s how you can do it: • Split your text into chunks (like 2-4 sentences) before sending them to the TTS provider. • Use methods like
push_text
,
flush
, and
end_input
to keep the flow smooth and controlled. Here’s a quick example:
Copy code
chunks = _text_to_chunks(streamed_text)
for chunk in chunks:
    stream.push_text(chunk)
stream.flush()
stream.end_input()
If you want, I’m here to help with any part of this—just let me know! Sources: livekit-plugins/livekit-plugins-lmnt/livekit/plugins/lmnt/tts.py | plugins/test/src/tts.ts
r
this is my current tts_node
Copy code
def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
        pronunciation_replacements = {
            "reveri": "reverie",
        }

        replaced_text = tokenize.utils.replace_words(text=text, replacements=pronunciation_replacements)
        return Agent.default.tts_node(self, text=replaced_text, model_settings=model_settings)
t
If you'd like to send text in smaller chunks to the TTS provider, you can tweak your
tts_node
method to handle and yield text bit by bit. Here's a simple example to get you started:
Copy code
async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
    max_chunk_size = 100  # Example chunk size

    async def process_text():
        current_chunk = []
        current_length = 0

        async for chunk in text:
            current_chunk.append(chunk)
            current_length += len(chunk)

            if current_length >= max_chunk_size or chunk.endswith(('.', '!', '?')):
                yield ' '.join(current_chunk)
                current_chunk = []
                current_length = 0

        if current_chunk:
            yield ' '.join(current_chunk)

    return Agent.default.tts_node(self, process_text(), model_settings)
This way, the text gets split into manageable chunks (like around 100 characters or at sentence breaks) and sent to the TTS provider step by step. If you want me to walk you through it or help with anything else, just let me know!
r
that seems to completely break the prose and ssml parsing
t
When working with SSML parsing and breaking up text for TTS, here are some handy tips to keep your prose sounding natural and smooth: • Turn on SSML Parsing: Just set
enable_ssml_parsing=True
in your TTS settings so those SSML tags get handled properly. • Go for Sentence-Based Tokenization: Instead of chopping text word by word, try using sentence tokenizers like
tokenize.blingfire.SentenceTokenizer()
to keep the context together. • Adjust Chunk Lengths: Use
chunk_length_schedule
to set chunk sizes—something like
[80, 120, 200, 260]
works well to keep the speech flowing naturally. Here’s a quick example to show you what I mean:
Copy code
tts = elevenlabs.TTS(
    enable_ssml_parsing=True,
    word_tokenizer=tokenize.blingfire.SentenceTokenizer(),
    chunk_length_schedule=[80, 120, 200, 260],
    voice_id="your-voice-id",
    model="eleven_turbo_v2_5"
)
This setup helps keep your SSML tags intact and makes sure the speech sounds just right. If you want to dive deeper or have any questions, I’m here to help! Sources: ElevenLabs TTS integration guide | LiveKit Docs | livekit-agents/livekit/agents/voice/transcription/filters.py
r
that doesn't work, would elevenlabs have any issues if there are lots of ssml breaks sent in the stream?
t
Hey there! Just a quick heads-up about ElevenLabs TTS and SSML parsing. It works great when enabled, but if SSML tags are incomplete or not formatted quite right, it can cause some hiccups in the streamed audio. Also, splitting SSML tags across different chunks might interrupt the flow of the audio stream. To keep things running smoothly, here are a few tips: • Make sure all your SSML tags are properly closed and formatted. • Keep an eye on chunk lengths so you don’t accidentally split SSML tags across chunks. • Be prepared for cases where incomplete XML might pop up at the end of a stream. If you want, I’m here to help you with these steps or chat more about how ElevenLabs TTS handles SSML. Just let me know! Sources: tests/test_tts.py | plugins/elevenlabs/src/tts.test.ts | ElevenLabs integration guide | LiveKit Docs
r
how can i do that if I don't know what the llm will send back?
t
When working with unknown LLM output for SSML parsing and chunking in ElevenLabs TTS, a helpful approach is to preprocess the text before sending it to the TTS engine. Here’s a friendly guide on how to do that: • Enable SSML Parsing: Just set
enable_ssml_parsing=True
in your TTS configuration so it can understand SSML tags properly. • Define Chunking Strategy: Use
chunk_length_schedule
to decide how to split the text, like
[80, 120, 200, 260]
— this helps keep things smooth. • Preprocess LLM Output: You can create a callback function to clean up or tweak the text before it gets to the TTS engine. For instance, removing any unwanted tokens or fixing malformed SSML tags. Here’s a simple example of such a callback:
Copy code
async def before_tts_callback(text: str | AsyncIterable[str]):
    if isinstance(text, str):
        return text.replace("unknown_tag", "")
    else:
        async def stream_processor():
            async for chunk in text:
                yield chunk.replace("unknown_tag", "")
        return stream_processor()
This little step helps make sure any unexpected LLM outputs are handled smoothly. If you want to chat more or need a hand with anything else, just let me know! Sources: ElevenLabs TTS integration guide | LiveKit Docs | Voice pipeline nodes | LiveKit Docs
r
I'm using auto_mode = True (the only way i can get it not to crash) so my chunk_length_schedule is disabled.
t
When you’re using
auto_mode=True
, the
chunk_length_schedule
gets disabled, but don’t worry—you can still manage SSML parsing and chunking smoothly: • Enable SSML Parsing: Just set
enable_ssml_parsing=True
to make sure SSML tags are processed properly. • Use SentenceTokenizer: Swap out the default tokenizer for
SentenceTokenizer
. This helps handle full sentences or phrases nicely, especially with
auto_mode
on. Here’s a quick example to get you started:
Copy code
tts = elevenlabs.TTS(
    enable_ssml_parsing=True,
    word_tokenizer=tokenize.SentenceTokenizer(),
    auto_mode=True
)
This way, you get SSML parsing and natural chunking without needing
chunk_length_schedule
. If you have any questions or want to chat more about it, just let me know—I’m here to help!
r
using the sentenceTokenizer increases the number of hallucincations elevenlabs is getting
t
Hey there! I totally get how tricky it can be dealing with hallucinations in ElevenLabs TTS, especially when using
SentenceTokenizer
or
auto_mode
. While there’s no specific magic fix yet, here are a few friendly tips you might find helpful: • Try preprocessing your text to clear out any ambiguous or super complex phrases before sending it over to the TTS engine. • Play around with different tokenization methods or even turn off
auto_mode
to have more control over how the text gets chunked. • Consider adding SSML tags to guide pronunciation or add emphasis where it’s needed. If you’re still running into issues, just give me a shout—I’m here to help and can definitely pass this along for a deeper look.