hi we are seeing mismatch in the EOU metrics shown in the lo LiveKit Community #ask-ai

hi we are seeing mismatch in the EOU metrics show...

rhythmic-flower-10210

09/11/2025, 5:40 AM

hi we are seeing mismatch in the EOU metrics shown in the logs after we upgraded it to 1.2.x The metrics says end_of_uttarance_delay as 70s consistently, but when we hear the recording the responses are instantaneous. We see no issues in the call quality

tall-airline-16756

09/11/2025, 5:40 AM

Thanks for reaching out, Kruthik. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/11/2025, 5:40 AM

Could you please share some details about your agent's configuration? Specifically: • Is VAD (Voice Activity Detection) enabled? • What turn detection mode are you using (e.g., VAD, STT, manual)? • What values do you have set for

min_endpointing_delay

and

max_endpointing_delay

? Having this information will really help us understand what's causing the metric discrepancy. Thank you! Sources: Improving voice AI&#39;s turn detection with transformers | Logs, metrics, and telemetry | LiveKit Docs | livekit-agents/livekit/agents/telemetry/trace_types.py | agents/src/metrics/utils.ts

tall-airline-16756

09/11/2025, 5:40 AM

Could you please share some details about your agent's configuration? Specifically: • Is VAD (Voice Activity Detection) turned on? • Which turn detection mode are you using (like VAD, STT, or manual)? • What values do you have set for min_endpointing_delay and max_endpointing_delay? Knowing this will really help us understand what's causing the metric discrepancy. Thanks so much!

rhythmic-flower-10210

09/11/2025, 5:41 AM

vad is enabled no turn detection model used

Copy code

min_endpointing_delay=0.2,
max_endpointing_delay=0.4,

tall-airline-16756

09/11/2025, 5:42 AM

The issue with the EOU metrics is connected to your current configuration. Let me walk you through a detailed explanation and some helpful recommendations: Root Cause: • When using VAD-only mode (without a turn detection model), the system relies solely on

min_endpointing_delay

(which is currently set to 0.2s) to determine the end of an utterance. In this setup, the

max_endpointing_delay

is not considered. • This is expected behavior and not a bug. Because of this, the metrics will hover around 200ms, as the system triggers EOU events right after the

min_endpointing_delay

expires following silence. Recommendations: 1. Adjust `min_endpointing_delay`: Try increasing it to a more comfortable value like 0.5s to avoid cutting off natural pauses too soon. For example:

min_endpointing_delay=0.5

2. Integrate a Turn Detection Model: By enabling a semantic turn detection model, you can make use of both

min_endpointing_delay

and

max_endpointing_delay

. This helps the system better understand when to wait longer based on the context of the conversation. Here's an example configuration:

Copy code

from livekit.plugins.turn_detector.english import EnglishModel
from livekit.plugins import silero

session = AgentSession(
    turn_detection=EnglishModel(),
    vad=silero.VAD.load(),
    min_endpointing_delay=0.2,
    max_endpointing_delay=0.4
)

Using a turn detection model can really enhance the flow of conversation by minimizing false interruptions. Please feel free to reach out if you'd like any help implementing these changes—I’m here to support you!

rhythmic-flower-10210

09/11/2025, 5:45 AM

but it was proper in 0.x

tall-airline-16756

09/11/2025, 5:46 AM

Could you please share a bit more about the specific differences you're noticing in the EOU metrics (like end_of_utterance_delay or transcription_delay) between LiveKit 0.x and 1.2.x? Understanding these details will really help us get to the bottom of what's causing the variation. I'm here to help! Sources: Improving voice AI&#39;s turn detection with transformers | tests/test_vad.py | agents/src/metrics/base.ts | Voice activity detection (VAD) parameters | LiveKit Docs

rhythmic-flower-10210

09/11/2025, 5:49 AM

it says 70s, 50s etc but the call quality is good

tall-airline-16756

09/11/2025, 5:50 AM

The EOU metrics showing delays of 50–70 seconds in LiveKit 1.2.x, despite good call quality, might be linked to some infrastructure or configuration challenges. Let’s walk through a detailed explanation and some helpful troubleshooting steps together: Key Insights: • The

end_of_utterance_delay

metric tracks the time from when VAD detects the end of speech to when the agent starts processing the user's turn. Delays of 50–70 seconds are definitely unusual and usually point to some underlying issues. • In LiveKit 1.2.x, the way this metric is calculated stays consistent with 0.x, but changes in infrastructure or configuration can influence the results. Potential Causes: 1. Infrastructure Throttling: If you're using AWS t3/t4 burstable instances, it’s possible that CPU credits get used up during VAD inference, causing significant throttling. This can result in those long delays, even if the call quality seems fine. For smoother performance, consider using non-burstable instances like

m5

c5

in your production environment. 2. Configuration Issues: It’s important to make sure

min_endpointing_delay

and

max_endpointing_delay

are set correctly. When running in VAD-only mode, only

min_endpointing_delay

is used, so setting it to a solid value like 0.5s can help avoid premature interruptions. Troubleshooting Steps: 1. Take a look at your instance type. If you’re on AWS t3/t4, keep an eye on

CPUCreditBalance

in CloudWatch to see if throttling is happening. 2. Double-check your turn detection mode. Make sure

turn_detection

is set to

vad

if you want to rely on EOU metrics. 3. Try logging EOU metrics directly from the STT component to get accurate readings of

end_of_utterance_delay

and

transcription_delay

. If you happen to be using burstable instances or want some help fine-tuning your configuration, please don’t hesitate to reach out. I’m here to guide you through it!

rhythmic-flower-10210

09/11/2025, 5:52 AM

how do I do it Try logging EOU metrics directly from the STT component

tall-airline-16756

09/11/2025, 5:52 AM

To log End-of-Utterance (EOU) metrics directly from the STT component, here’s a friendly guide to help you through the process: 1. Initialize the STT Component: First, set up your STT provider (like Deepgram or Gladia) as part of your agent configuration. 2. Subscribe to the

eou_metrics_collected

Event: You’ll want to attach an event listener to your STT instance to capture those valuable EOU metrics. Here’s a simple example in Python:

Copy code

from livekit.agents.metrics import EOUMetrics
from livekit.plugins import deepgram

class MyAgent:
    def __init__(self):
        stt = deepgram.STT()

        # Wrap the callback to avoid blocking the event loop
        def eou_wrapper(metrics: EOUMetrics):
            asyncio.create_task(self.on_eou_metrics_collected(metrics))

        # Subscribe to EOU metrics
        stt.on("eou_metrics_collected", eou_wrapper)

    async def on_eou_metrics_collected(self, metrics: EOUMetrics):
        # Log or process EOU metrics
        <http://logger.info|logger.info>("EOU metrics", extra={"end_of_utterance_delay": round(metrics.end_of_utterance_delay, 2), "transcription_delay": round(metrics.transcription_delay, 2), "speech_id": metrics.speech_id})

3. Key Metrics Captured: • `end_of_utterance_delay`: This is the time from the end of speech (VAD) to turn completion, including transcription delay. • `transcription_delay`: The time between the end of speech and when the final transcript is ready. • `speech_id`: A unique identifier for each user’s turn. 4. Ensure Proper Configuration: Just a quick heads-up: EOU metrics will only be emitted when turn_detection is enabled (for example, using VAD or LiveKit's turn detector). For a complete working example, you can check out the metrics_stt.py file in the LiveKit examples repository. If you have any questions or need a hand with anything else, please don’t hesitate to reach out—I’m here to help! Sources: Logs, metrics, and telemetry | LiveKit Docs | complex-agents/turn-taking/README.md

tall-airline-16756

09/11/2025, 5:53 AM

To log EOU metrics directly from the STT component, let's walk through the steps together: 1. Initialize the STT Component: Start by setting up your STT provider (like Deepgram or Gladia) as part of your agent configuration. 2. Subscribe to the

eou_metrics_collected

Event: You'll want to attach an event listener to the STT instance to capture those EOU metrics. Here's a helpful example in Python:

Copy code

from livekit.agents.metrics import EOUMetrics
from livekit.plugins import deepgram

class MyAgent:
    def __init__(self):
        stt = deepgram.STT()

        # Wrap the callback to avoid blocking the event loop
        def eou_wrapper(metrics: EOUMetrics):
            asyncio.create_task(self.on_eou_metrics_collected(metrics))

        # Subscribe to EOU metrics
        stt.on("eou_metrics_collected", eou_wrapper)

    async def on_eou_metrics_collected(self, metrics: EOUMetrics):
        # Log or process EOU metrics
        print({
            "end_of_utterance_delay": round(metrics.end_of_utterance_delay, 2),
            "transcription_delay": round(metrics.transcription_delay, 2),
            "speech_id": metrics.speech_id,
        })

3. Key Metrics: • `end_of_utterance_delay`: This tells you the time from the end of speech (VAD) to turn completion, including transcription delay. • `transcription_delay`: This measures the time between the end of speech and when the final transcript becomes available. • `speech_id`: A unique identifier for the user’s turn, helping you keep track. 4. Ensure Turn Detection is Enabled: Just a quick note—EOU metrics will only be emitted when turn_detection is configured (like using VAD or LiveKit's turn detector). For a complete, working example, you can check out the metrics_stt.py file in the LiveKit examples repository. If you have any questions or need a hand with anything else, feel free to ask—I'm here to help!

Open in Slack

Previous Next