mysterious-agent-76276
09/13/2025, 9:31 AMgpt-4o-mini-transcribe
is detecting non-English words even though the config specifies the language:
stt=openai.STT(
api_key=config.stt.openai_api_key, # type: ignore
model=config.stt.openai_model,
language="en",
use_realtime=True,
)
tall-airline-16756
09/13/2025, 9:31 AMtall-airline-16756
09/13/2025, 9:32 AMgpt-4o-mini-transcribe
model might pick up non-English words even when the language is set to "en" because the language
parameter acts more like a helpful hint rather than a strict rule. Since the model is naturally multilingual, it’s designed to recognize a variety of speech patterns, including situations where people switch between languages or mix them.
If you’re aiming for transcription strictly in English, here are a few options you might find useful:
• Try pre-processing your audio to filter out or reduce any non-English speech parts.
• After transcription, you could use language identification tools to spot and either remove or flag any non-English text.
• If possible, use contextual prompts to encourage English-only input—though just a heads-up, this isn’t officially supported for gpt-4o-mini-transcribe
.
Please feel free to reach out if you’d like help with any of these steps—I’m here to support you!
Sources: OpenAI STT integration guide | LiveKit Docs | examples/voice_agents/structured_output.pymysterious-agent-76276
09/13/2025, 9:33 AMtall-airline-16756
09/13/2025, 9:34 AMwhisper-1
and gpt-4o-transcribe
! You can set the language
parameter to "en" to help guide the transcription towards English. That said, because these models are trained on multiple languages, they might still pick up some non-English words occasionally.
If you’re noticing some inaccuracies with English input, it might be due to audio quality or other factors. Here are a few friendly suggestions you could try:
• Give the whisper-1
model a shot—it’s especially designed for speech-to-text tasks.
• Consider improving the audio clarity beforehand, which can make a big difference.
• Look into post-processing the transcription to filter or flag any non-English parts.
Feel free to reach out if you want a hand with any of these steps or if you’d like me to explain anything further. I’m here to help!
Sources: OpenAI integration guide | LiveKit Docs | OpenAI STT integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-fal/livekit/plugins/fal/stt.py