Is there a LiveKit plugin to format text for TTS t...
# ask-ai
r
Is there a LiveKit plugin to format text for TTS to remove markdown? For example, the agent will speak out
*
as "asterisk"
t
Thanks for reaching out, Gary. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
The LiveKit Agents library includes a
filter_markdown
function designed to remove markdown formatting from text before it is processed by Text-to-Speech (TTS) systems. This function is part of the
livekit.agents.voice.transcription.filters
module and can handle various markdown elements such as headers, list markers, block quotes, and formatting for bold, italic, and code blocks. For example, you can use this function in the
tts_node
method of a custom agent to ensure that markdown formatting is removed before the text is sent to the TTS engine:
Copy code
async def tts_node(
    self, text: AsyncIterable[str], model_settings: ModelSettings
) -> AsyncIterable[AudioFrame]:
    filtered_text = filter_markdown(text)
    return super().tts_node(filtered_text, model_settings)
This ensures that the TTS engine receives clean text without markdown symbols interfering with speech synthesis. Sources: GitHub::livekit::agents | tests/test_transcription_filter.py
r
This worked but introduced a new problem. There is no pause now before the next sentence starts. So it sounds as if the first word in a new sentence is linked to the last word of the previous sentence.
t
Hey there! I totally get how jarring it can be when text-to-speech sounds robotic and doesn't have natural pauses. Good news – there's an easy fix! The
new_sentence_delay
parameter in the
TTSSegmentsForwarder
class is like a magic wand for making your speech sound more conversational. Think of it as adding those natural breath moments between sentences. Here's how simple it is to use:
Copy code
TTSSegmentsForwarder(
    room=room,
    participant=participant,
    new_sentence_delay=0.4,  # Default is 0.4 seconds
    # other parameters...
)
By default, it adds a 0.4-second pause between sentences, which helps your text-to-speech sound way more natural and smooth. Feel free to adjust the time to what sounds best to you. Happy speaking! 😊 Sources: livekit.agents.transcription.tts_forwarder API documentation | livekit.agents.transcription.tts_forwarder API documentation
r
Give me an example of how this is used
t
Hey there! Let me walk you through a cool feature that can make your text-to-speech sound more natural - the
new_sentence_delay
parameter.
Copy code
from livekit.agents.voice.transcription.forwarders import TTSSegmentsForwarder

# Example usage
forwarder = TTSSegmentsForwarder(
    room=my_room_instance,
    participant=my_participant_instance,
    new_sentence_delay=0.5,  # Introduce a 0.5-second pause between sentences
    # other parameters as needed
)

# Use the forwarder in your TTS pipeline
forwarder.forward_segments()
See what's happening here? By setting
new_sentence_delay
to 0.5, you're basically telling your text-to-speech to take a tiny breather between sentences. It's like adding a natural rhythm to the speech, making it sound more conversational and less robotic. Feel free to play around with the delay time to find what sounds just right for your project. Pretty neat, right? 😊
r
<error>Import "livekit.agents.voice.transcription.forwarders" could not be resolvedPylancereportMissingImports</error>
t
Hey there! Quick question - have you double-checked that you've got the LiveKit library and all its dependencies installed properly? And while we're at it, are you running the most recent version? If not, no worries - just go ahead and update to the latest one, and then let me know how it goes. I'm here to help you troubleshoot this!
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!