Hi folks, We’re building a Voice-to-Voice AI assis...
# ask-ai
p
Hi folks, We’re building a Voice-to-Voice AI assistant using Azure for STT, but we’re facing an issue with multilingual support. I’m passing multiple languages as an array of strings, but Azure only seems to consider the first language in the array and doesn’t detect the others. Could you please help me understand why this behavior occurs? Reference Docs: • https://docs.livekit.io/agents/integrations/stt/azure/https://docs.livekit.io/reference/python/v1/livekit/plugins/azure/index.html#livekit.plugins.azure.STT
t
Thanks for reaching out, Vishal. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Just a quick heads-up on how Azure STT handles languages: it takes the first language you list as the main one and uses its auto-detection feature to pick up on any others. Here’s the scoop: • The first language in your list is the one it actively focuses on. • Other languages get noticed quietly in the background thanks to auto-detection. • If it can’t find a match for a detected language, it simply goes back to that first language. If you want to get the best out of multilingual support, here are a few friendly tips: • Stick to a single target language when you can. • Make sure the first language you put in the list is the one you expect most often. • Keep an eye on how well auto-detection works for your situation. This setup is designed to strike a nice balance between actively using your main language and letting auto-detection do its thing behind the scenes.
p
I set English as the first language and Hindi as the second. Azure STT detects English perfectly, but when I switch mid-utterance to Hindi, it doesn’t detect the Hindi segments at all. Any idea why this happens or how to enable seamless language switching? Steps to Reproduce 1. Configure STT with languages = ["en-US", "hi-IN"] (English first, Hindi second). 2. Start speaking in English, then switch to Hindi mid-session. Expected STT detects both English and Hindi automatically. Actual Only English is recognized; Hindi segments are ignored.
t
Just a heads-up: Azure STT doesn’t natively support smooth language switching right in the middle of an utterance when you're using multiple languages. But don’t worry! You can totally get this done with the Gladia STT plugin, which is great for handling code-switching. Here’s a quick guide on how to set it up: 1. Set the
languages
parameter to include all the languages you want, like
["en", "hi"]
. 2. Turn on code-switching by setting
code_switching=True
. Here’s an example configuration to get you started:
Copy code
gladia_stt = GladiaSTT(
    languages=["en", "hi"],
    code_switching=True
)
And if you’re also looking to add translation, you can enable that too:
Copy code
gladia_stt = GladiaSTT(
    languages=["en", "hi"],
    code_switching=True,
    translation_enabled=True,
    translation_target_languages=["en"]  # Translate to English
)
This way, your STT setup can smoothly handle multiple languages and switch between them without missing a beat.
p
I want to detect languages like Spanish, English, Korean, Mandarin, Cantonese, and Vietnamese. Does Gladia support these languages?
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!
The assistant seemed to misunderstand your initial problem. @prehistoric-hospital-42098: are you able to reproduce the issue with Azure STT using the basic agent example? Can you share your configuration?