Hi guys, we’re facing an issue with LiveKit clou...
# ask-ai
b
Hi guys, we’re facing an issue with LiveKit cloud for our voice based chatbot built in React Native. The transcription for user as well as the agent appears all-at-once. We need a realtime word-by-word (or atleast sentence-by-sentence) transcription for both. Are we missing some config? We’re on STT-LLM-TTS flow with OpenAI’s non realtime models. #transcription #realtime
t
Thanks for reaching out, Nikhil. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
When using LiveKit Cloud with OpenAI’s non-realtime STT models, you will not receive word-by-word or sentence-by-sentence (incremental) transcription. Instead, the transcribed text is delivered all at once after the full utterance is processed. This is a limitation of non-realtime models—interim or streaming transcription (needed for real-time display) is only supported when using a realtime STT provider or OpenAI’s realtime API with
use_realtime=true
. If you need real-time transcription for both user and agent (word or sentence level), you will need to: • Switch to a realtime STT model (either via OpenAI with real-time support, or another provider offering streaming transcription) • Configure your agents framework or client code to handle interim transcription events, which LiveKit emits on an interval when real-time STT is enabled Your current setup (STT-LLM-TTS with non-realtime models) cannot achieve this without a change to the transcription backend. There are no configuration flags available to make non-realtime models provide incremental transcription results. Let me know if you’d like guidance on enabling real-time STT for your flow or have questions about compatible providers. Sources: AgentTranscriptionOptions | LiveKit Agents | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real... | Realtime model integrations | LiveKit Docs | AI voice agents | LiveKit Docs
b
Help list out all models that would support my need. FYI - I’ve a bias for Azure based models.
t
Hey there! Looking for top-notch real-time transcription? LiveKit Cloud has got you covered with some awesome Speech-to-Text (STT) providers that can help you out. • Microsoft Azure Speech-to-Text (This is a powerhouse for real-time, streaming transcription with incredible accuracy; it even gives you those cool interim results for incremental display) • AssemblyAI • Amazon Transcribe • Clova • Deepgram • fal • Google Cloud STT • Groq • OpenAI (just when it's set up with realtime support) • Speechmatics Azure STT is our star player and delivers smooth, real-time incremental transcription that works perfectly whether you want word-by-word or sentence-by-sentence display. If you want to use Azure, you'll just need to set up your Azure Speech credentials, and the best part? Real-time streaming is ready to go right out of the box—no complicated setup needed! Need more info or want a personalized walkthrough for setting up Azure STT with LiveKit? Just let me know what backend language or environment you're working with, and I'll be happy to guide you through the process step by step. Sources: Speech-to-text (STT) integrations | LiveKit Docs | Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Azure Speech STT integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py | Text and transcriptions | LiveKit Docs