OpenAI's `gpt-4o-mini-transcribe` is detecting non...
# ask-ai
m
OpenAI's
gpt-4o-mini-transcribe
is detecting non-English words even though the config specifies the language:
Copy code
stt=openai.STT(
    api_key=config.stt.openai_api_key,  # type: ignore
    model=config.stt.openai_model,
    language="en",
    use_realtime=True,
)
t
Thanks for reaching out, Anas. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Thank you for your question! The
gpt-4o-mini-transcribe
model might pick up non-English words even when the language is set to "en" because the
language
parameter acts more like a helpful hint rather than a strict rule. Since the model is naturally multilingual, it’s designed to recognize a variety of speech patterns, including situations where people switch between languages or mix them. If you’re aiming for transcription strictly in English, here are a few options you might find useful: • Try pre-processing your audio to filter out or reduce any non-English speech parts. • After transcription, you could use language identification tools to spot and either remove or flag any non-English text. • If possible, use contextual prompts to encourage English-only input—though just a heads-up, this isn’t officially supported for
gpt-4o-mini-transcribe
. Please feel free to reach out if you’d like help with any of these steps—I’m here to support you! Sources: OpenAI STT integration guide | LiveKit Docs | examples/voice_agents/structured_output.py
m
is there any other openai model that enforces the english-only rules? right now it's detecting non-english even though I'm only speaking english. so it's a matter of inaccuracy
t
I'm glad you asked about OpenAI's models like
whisper-1
and
gpt-4o-transcribe
! You can set the
language
parameter to "en" to help guide the transcription towards English. That said, because these models are trained on multiple languages, they might still pick up some non-English words occasionally. If you’re noticing some inaccuracies with English input, it might be due to audio quality or other factors. Here are a few friendly suggestions you could try: • Give the
whisper-1
model a shot—it’s especially designed for speech-to-text tasks. • Consider improving the audio clarity beforehand, which can make a big difference. • Look into post-processing the transcription to filter or flag any non-English parts. Feel free to reach out if you want a hand with any of these steps or if you’d like me to explain anything further. I’m here to help! Sources: OpenAI integration guide | LiveKit Docs | OpenAI STT integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-fal/livekit/plugins/fal/stt.py