hey guys how is it going I am using livekit agents v1 and I LiveKit Community #ask-ai

hey guys how is it going? I am using livekit agen...

able-book-76021

07/26/2025, 9:16 PM

hey guys how is it going? I am using livekit agents v1 and I am wondering why my TTS is mispronouncing numbers, i.e the bitcoin price is $118,152.31. the model is basically just reading the numbers or sometimes skipping them. I am using the elevenlabs tts multilingual v2. On my custom stt - llm - tts pipeline, even the elevenlabs turbo v2 has no issue with this. I checked on the website https://elevenlabs.io/app/speech-synthesis/text-to-speech, and gave it the same sentence, and this can do it perfectly. Does anyone on the team know why I am having issues with this?

tall-airline-16756

07/26/2025, 9:16 PM

Thanks for reaching out, mackiannii. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/26/2025, 9:18 PM

The issue you're experiencing with ElevenLabs TTS multilingual v2 mispronouncing numbers in LiveKit agents v1 could be related to how the model processes numerical values. While the ElevenLabs turbo v2 and their website handle this correctly, the multilingual v2 model may have differences in text normalization or preprocessing that affect number pronunciation. To address this, you can try the following steps: 1. Enable SSML Parsing: Ensure that

enable_ssml_parsing

is set to

True

in your configuration. This allows you to use SSML (Speech Synthesis Markup Language) to customize pronunciation. 2. Use SSML Phoneme Tags: You can use SSML

phoneme

tags to specify the correct pronunciation for numbers or other terms. This is particularly useful for ensuring accuracy in financial values. 3. Preprocess Text: Convert numbers into their written-out forms (e.g., "118,152.31" to "one hundred eighteen thousand one hundred fifty-two point thirty-one") before sending the text to the TTS system. This can help improve pronunciation consistency. If these steps do not resolve the issue, it may be necessary to consult ElevenLabs' technical documentation or support team for model-specific details about why the multilingual v2 model behaves differently in this context. Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | How to Get Help from LiveKit | Rime integration guide | LiveKit Docs | ElevenLabs TTS integration guide | LiveKit Docs | tools/tts/tts.cpp

able-book-76021

07/26/2025, 9:19 PM

Well i am trying to display the numeric form but have the TTS output the spoken form

tall-airline-16756

07/26/2025, 9:20 PM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

able-book-76021

07/26/2025, 9:24 PM

could this be because it is streaming in before the TTS can see the full word?

able-book-76021

07/26/2025, 9:39 PM

Open in Slack

Previous Next