We’ve been noticing intermittent issues in our Liv...
# ask-ai
m
We’ve been noticing intermittent issues in our LiveKit agent setup where user messages are not being captured during conversations. The agent did not respond in certain cases, and upon reviewing the logs, we found no events for STT (speech-to-text) or EOU (end-of-utterance) during those instances. Here’s an overview of our setup: • Calling setup: Using phone.com with Vonage numbers. • LiveKit infrastructure: Worker running on Kubernetes. • Integrations used in the agent: Azure OpenAI LLM, Azure TTS, Deepgram STT; VAD (voice activity detection) : with background noise cancellation enabled. We’re wondering if this could potentially be: • A provider-side issue, • A VAD-related issue, or • If the background noise cancellation model might be suppressing the user’s audio unintentionally? We haven’t logged VAD metrics since they are recurring and might be overwhelming to monitor continuously.However, we’re considering whether adding them could help with debugging the issue.
t
Thanks for reaching out, Gaurav. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I understand how frustrating it can be when user messages aren’t captured and no STT or EOU events are logged. Let’s work through some potential causes and debugging steps together to help resolve this: • Background Noise Cancellation: Sometimes, overly aggressive noise cancellation can unintentionally suppress user audio. I recommend enabling Background Voice Cancellation (BVC) in your agent setup and avoiding the use of frontend Krisp noise cancellation at the same time. You can also test audio samples using the noise canceller tool available in LiveKit resources to get a clearer picture. • VAD/EOU Configuration: It’s important to double-check that your VAD and turn detection settings are configured correctly. Try adjusting
minEndpointingDelay
and
maxEndpointingDelay
to the suggested values (like 0.5s and 6.0s). Also, make sure the
turnDetectionMode
matches the logic you’re expecting. • Audio Source Routing: If you have multiple participants, please ensure the agent is processing the right audio stream by specifying a fixed participant in
RoomIO
. This can help avoid confusion in audio routing. • Logging and Metrics: Turning on detailed logging for VAD, STT, and EOU events using recognition hooks can be really helpful. It allows you to trace the processing flow closely and spot where events might be missing. I hope these suggestions help you get closer to a solution. If you have any questions or need more guidance on any of these steps, please don’t hesitate to reach out—I’m here to help! Sources: agents/src/voice/audio_recognition.ts | Community & Support | livekit/livekit_composite | DeepWiki | Enhanced noise cancellation | LiveKit Docs | Docs::Home | Getting Help | livekit/livekit_composite | DeepWiki
m
how to get detailed logs to debug it ?
t
To help you enable detailed logs for debugging VAD, STT, and EOU events in LiveKit, here’s a simple step-by-step guide you can follow: 1. Set Logging Level to Debug: Start by configuring your application to log at the debug level. This will give you more detailed information. For example: ◦ In Python:
logging.getLogger('livekit').setLevel(logging.DEBUG)
◦ In Node.js:
log.setLevel('debug')
◦ In Flutter:
Logger.root.level = Level.FINE
1. Subscribe to Metrics Events: Next, use the
metrics_collected
event to capture structured metrics for STT and EOU. Here’s a quick example:
Copy code
from livekit.agents import metrics

@stt.on("metrics_collected")
def on_stt_metrics(metrics: metrics.STTMetrics):
    metrics.log_metrics(metrics)
2. Log VAD Events: You can also log VAD events like
START_OF_SPEECH
and
END_OF_SPEECH
by using recognition hooks. For instance:
Copy code
def on_start_of_speech(ev):
    logger.debug("VAD: START_OF_SPEECH")
3. Review Debug Logs: Once you’ve set this up, your debug logs will include important events such as VAD triggers, STT transcripts, and EOU detection details, giving you a clearer picture. I hope these steps make it easier for you to capture the detailed logs you need. If you have any questions or need further help, please don’t hesitate to reach out—I’m here for you! Sources: Logs, metrics, and telemetry | LiveKit Docs | example/lib/main.dart | Capturing metrics | LiveKit Docs | avatars/hedra/education_avatar/education-frontend/hooks/useDebug.ts | How does end-of-utterance detection work in conversations?
m
But all the settings and logs that you are suggesting, we are already using it . can there be any other reason ?
t
Thanks for checking all the suggested settings and logs! If you’re still missing STT or EOU events, here are a few other things we might want to explore together: • Turn Detection Mode: It’s important to make sure the
turnDetectionMode
is set to either VAD or LiveKit’s turn detector plugin. Just a heads up, server-side turn detection won’t emit EOU metrics. • Missing Final Transcript: If you’re not seeing a
FINAL_TRANSCRIPT
event from STT, the EOU detection might be skipped. Let’s double-check that transcripts are coming through consistently. • Manual Turn Detection: When using manual mode, if a user turn is already committed, subsequent STT events might be ignored, which can affect how EOU behaves. • VAD and STT Synchronization: Sometimes, if the timing between VAD and STT events is off—like missing
END_OF_SPEECH
events or delayed transcripts—it can interrupt EOU triggering. • Fallback Timer Issues: Some STT engines depend on fallback timers for EOU detection, so any delays or failures there could cause missed events. • Language/Model Support: If the turn detector model doesn’t support the language detected, EOU prediction might be skipped. These are some runtime and synchronization factors that might be at play here. Please feel free to reach out if you’d like me to help you dive deeper into any of these areas—I’m here to support you! Sources: Logs, metrics, and telemetry | LiveKit Docs | agents/src/voice/audio_recognition.ts | livekit-agents/livekit/agents/voice/audio_recognition.py