lively-horse-90722
07/14/2025, 11:25 AMendpointing_ms=25
tall-airline-16756
07/14/2025, 11:25 AMtall-airline-16756
07/14/2025, 11:27 AMendpointing_ms=25
, the delay in receiving is_final
for 1-2 word utterances is affected by several factors beyond the Deepgram endpointing threshold:
• Audio chunk size: LiveKit processes audio in chunks (typically 50ms or 100ms), so endpointing values smaller than these may not noticeably reduce delay.
• Turn detection: LiveKit uses its own turn detection system, which can apply added delays after endpointing is detected (e.g., a default minimum delay of 0.5s), so is_final
may not emit immediately even after Deepgram signals an endpoint.
• Processing logic: State management, alternative result handling, and whether no_delay
is set when using smart_format
can influence timing. For short utterances, these mechanisms may introduce further delay since the system waits to ensure that the phrase is complete.
To minimize delays for short utterances:
• Enable no_delay: true
if using smart_format
.
• Consider reducing LiveKit's min_endpointing_delay
to match your application's latency needs.
• Ensure your endpointing_ms
setting is not significantly below your audio chunk size.
These adjustments help—but some delay is expected to avoid false endpointing for very brief utterances. Further tuning might be necessary for your specific use case.
Sources: livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py | plugins/deepgram/src/stt.ts | How does end-of-utterance detection work in conversations?