I am using Deepgram for STT For shorter sentences 1 2 words LiveKit Community #ask-ai

I am using Deepgram for STT. For shorter sentences...

lively-horse-90722

07/14/2025, 11:25 AM

I am using Deepgram for STT. For shorter sentences (1-2 words), is_final is sent after ~2 seconds. I have set

Copy code

endpointing_ms=25

tall-airline-16756

07/14/2025, 11:25 AM

Thanks for reaching out, Deepak. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/14/2025, 11:27 AM

Even with

endpointing_ms=25

, the delay in receiving

is_final

for 1-2 word utterances is affected by several factors beyond the Deepgram endpointing threshold: • Audio chunk size: LiveKit processes audio in chunks (typically 50ms or 100ms), so endpointing values smaller than these may not noticeably reduce delay. • Turn detection: LiveKit uses its own turn detection system, which can apply added delays after endpointing is detected (e.g., a default minimum delay of 0.5s), so

is_final

may not emit immediately even after Deepgram signals an endpoint. • Processing logic: State management, alternative result handling, and whether

no_delay

is set when using

smart_format

can influence timing. For short utterances, these mechanisms may introduce further delay since the system waits to ensure that the phrase is complete. To minimize delays for short utterances: • Enable

no_delay: true

if using

smart_format

. • Consider reducing LiveKit's

min_endpointing_delay

to match your application's latency needs. • Ensure your

endpointing_ms

setting is not significantly below your audio chunk size. These adjustments help—but some delay is expected to avoid false endpointing for very brief utterances. Further tuning might be necessary for your specific use case. Sources: livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py | plugins/deepgram/src/stt.ts | How does end-of-utterance detection work in conversations?

Open in Slack

Previous Next