hey guys how is it going? I am using livekit agen...
# ask-ai
a
hey guys how is it going? I am using livekit agents v1 and I am wondering why my TTS is mispronouncing numbers, i.e the bitcoin price is $118,152.31. the model is basically just reading the numbers or sometimes skipping them. I am using the elevenlabs tts multilingual v2. On my custom stt - llm - tts pipeline, even the elevenlabs turbo v2 has no issue with this. I checked on the website https://elevenlabs.io/app/speech-synthesis/text-to-speech, and gave it the same sentence, and this can do it perfectly. Does anyone on the team know why I am having issues with this?
t
Thanks for reaching out, mackiannii. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
The issue you're experiencing with ElevenLabs TTS multilingual v2 mispronouncing numbers in LiveKit agents v1 could be related to how the model processes numerical values. While the ElevenLabs turbo v2 and their website handle this correctly, the multilingual v2 model may have differences in text normalization or preprocessing that affect number pronunciation. To address this, you can try the following steps: 1. Enable SSML Parsing: Ensure that
enable_ssml_parsing
is set to
True
in your configuration. This allows you to use SSML (Speech Synthesis Markup Language) to customize pronunciation. 2. Use SSML Phoneme Tags: You can use SSML
phoneme
tags to specify the correct pronunciation for numbers or other terms. This is particularly useful for ensuring accuracy in financial values. 3. Preprocess Text: Convert numbers into their written-out forms (e.g., "118,152.31" to "one hundred eighteen thousand one hundred fifty-two point thirty-one") before sending the text to the TTS system. This can help improve pronunciation consistency. If these steps do not resolve the issue, it may be necessary to consult ElevenLabs' technical documentation or support team for model-specific details about why the multilingual v2 model behaves differently in this context. Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | How to Get Help from LiveKit | Rime integration guide | LiveKit Docs | ElevenLabs TTS integration guide | LiveKit Docs | tools/tts/tts.cpp
a
Well i am trying to display the numeric form but have the TTS output the spoken form
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!
a
could this be because it is streaming in before the TTS can see the full word?
?