hi we are seeing mismatch in the EOU metrics show...
# ask-ai
r
hi we are seeing mismatch in the EOU metrics shown in the logs after we upgraded it to 1.2.x The metrics says end_of_uttarance_delay as 70s consistently, but when we hear the recording the responses are instantaneous. We see no issues in the call quality
t
Thanks for reaching out, Kruthik. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Could you please share some details about your agent's configuration? Specifically: • Is VAD (Voice Activity Detection) enabled? • What turn detection mode are you using (e.g., VAD, STT, manual)? • What values do you have set for
min_endpointing_delay
and
max_endpointing_delay
? Having this information will really help us understand what's causing the metric discrepancy. Thank you! Sources: Improving voice AI's turn detection with transformers | Logs, metrics, and telemetry | LiveKit Docs | livekit-agents/livekit/agents/telemetry/trace_types.py | agents/src/metrics/utils.ts
Could you please share some details about your agent's configuration? Specifically: • Is VAD (Voice Activity Detection) turned on? • Which turn detection mode are you using (like VAD, STT, or manual)? • What values do you have set for min_endpointing_delay and max_endpointing_delay? Knowing this will really help us understand what's causing the metric discrepancy. Thanks so much!
r
vad is enabled no turn detection model used
Copy code
min_endpointing_delay=0.2,
max_endpointing_delay=0.4,
t
The issue with the EOU metrics is connected to your current configuration. Let me walk you through a detailed explanation and some helpful recommendations: Root Cause: • When using VAD-only mode (without a turn detection model), the system relies solely on
min_endpointing_delay
(which is currently set to 0.2s) to determine the end of an utterance. In this setup, the
max_endpointing_delay
is not considered. • This is expected behavior and not a bug. Because of this, the metrics will hover around 200ms, as the system triggers EOU events right after the
min_endpointing_delay
expires following silence. Recommendations: 1. Adjust `min_endpointing_delay`: Try increasing it to a more comfortable value like 0.5s to avoid cutting off natural pauses too soon. For example:
min_endpointing_delay=0.5
2. Integrate a Turn Detection Model: By enabling a semantic turn detection model, you can make use of both
min_endpointing_delay
and
max_endpointing_delay
. This helps the system better understand when to wait longer based on the context of the conversation. Here's an example configuration:
Copy code
from livekit.plugins.turn_detector.english import EnglishModel
from livekit.plugins import silero

session = AgentSession(
    turn_detection=EnglishModel(),
    vad=silero.VAD.load(),
    min_endpointing_delay=0.2,
    max_endpointing_delay=0.4
)
Using a turn detection model can really enhance the flow of conversation by minimizing false interruptions. Please feel free to reach out if you'd like any help implementing these changes—I’m here to support you!
r
but it was proper in 0.x
t
Could you please share a bit more about the specific differences you're noticing in the EOU metrics (like end_of_utterance_delay or transcription_delay) between LiveKit 0.x and 1.2.x? Understanding these details will really help us get to the bottom of what's causing the variation. I'm here to help! Sources: Improving voice AI's turn detection with transformers | tests/test_vad.py | agents/src/metrics/base.ts | Voice activity detection (VAD) parameters | LiveKit Docs
r
it says 70s, 50s etc but the call quality is good
t
The EOU metrics showing delays of 50–70 seconds in LiveKit 1.2.x, despite good call quality, might be linked to some infrastructure or configuration challenges. Let’s walk through a detailed explanation and some helpful troubleshooting steps together: Key Insights: • The
end_of_utterance_delay
metric tracks the time from when VAD detects the end of speech to when the agent starts processing the user's turn. Delays of 50–70 seconds are definitely unusual and usually point to some underlying issues. • In LiveKit 1.2.x, the way this metric is calculated stays consistent with 0.x, but changes in infrastructure or configuration can influence the results. Potential Causes: 1. Infrastructure Throttling: If you're using AWS t3/t4 burstable instances, it’s possible that CPU credits get used up during VAD inference, causing significant throttling. This can result in those long delays, even if the call quality seems fine. For smoother performance, consider using non-burstable instances like
m5
or
c5
in your production environment. 2. Configuration Issues: It’s important to make sure
min_endpointing_delay
and
max_endpointing_delay
are set correctly. When running in VAD-only mode, only
min_endpointing_delay
is used, so setting it to a solid value like 0.5s can help avoid premature interruptions. Troubleshooting Steps: 1. Take a look at your instance type. If you’re on AWS t3/t4, keep an eye on
CPUCreditBalance
in CloudWatch to see if throttling is happening. 2. Double-check your turn detection mode. Make sure
turn_detection
is set to
vad
if you want to rely on EOU metrics. 3. Try logging EOU metrics directly from the STT component to get accurate readings of
end_of_utterance_delay
and
transcription_delay
. If you happen to be using burstable instances or want some help fine-tuning your configuration, please don’t hesitate to reach out. I’m here to guide you through it!
r
how do I do it Try logging EOU metrics directly from the STT component
t
To log End-of-Utterance (EOU) metrics directly from the STT component, here’s a friendly guide to help you through the process: 1. Initialize the STT Component: First, set up your STT provider (like Deepgram or Gladia) as part of your agent configuration. 2. Subscribe to the
eou_metrics_collected
Event: You’ll want to attach an event listener to your STT instance to capture those valuable EOU metrics. Here’s a simple example in Python:
Copy code
from livekit.agents.metrics import EOUMetrics
from livekit.plugins import deepgram

class MyAgent:
    def __init__(self):
        stt = deepgram.STT()

        # Wrap the callback to avoid blocking the event loop
        def eou_wrapper(metrics: EOUMetrics):
            asyncio.create_task(self.on_eou_metrics_collected(metrics))

        # Subscribe to EOU metrics
        stt.on("eou_metrics_collected", eou_wrapper)

    async def on_eou_metrics_collected(self, metrics: EOUMetrics):
        # Log or process EOU metrics
        <http://logger.info|logger.info>("EOU metrics", extra={"end_of_utterance_delay": round(metrics.end_of_utterance_delay, 2), "transcription_delay": round(metrics.transcription_delay, 2), "speech_id": metrics.speech_id})
3. Key Metrics Captured: • `end_of_utterance_delay`: This is the time from the end of speech (VAD) to turn completion, including transcription delay. • `transcription_delay`: The time between the end of speech and when the final transcript is ready. • `speech_id`: A unique identifier for each user’s turn. 4. Ensure Proper Configuration: Just a quick heads-up: EOU metrics will only be emitted when turn_detection is enabled (for example, using VAD or LiveKit's turn detector). For a complete working example, you can check out the metrics_stt.py file in the LiveKit examples repository. If you have any questions or need a hand with anything else, please don’t hesitate to reach out—I’m here to help! Sources: Logs, metrics, and telemetry | LiveKit Docs | complex-agents/turn-taking/README.md
To log EOU metrics directly from the STT component, let's walk through the steps together: 1. Initialize the STT Component: Start by setting up your STT provider (like Deepgram or Gladia) as part of your agent configuration. 2. Subscribe to the
eou_metrics_collected
Event: You'll want to attach an event listener to the STT instance to capture those EOU metrics. Here's a helpful example in Python:
Copy code
from livekit.agents.metrics import EOUMetrics
from livekit.plugins import deepgram

class MyAgent:
    def __init__(self):
        stt = deepgram.STT()

        # Wrap the callback to avoid blocking the event loop
        def eou_wrapper(metrics: EOUMetrics):
            asyncio.create_task(self.on_eou_metrics_collected(metrics))

        # Subscribe to EOU metrics
        stt.on("eou_metrics_collected", eou_wrapper)

    async def on_eou_metrics_collected(self, metrics: EOUMetrics):
        # Log or process EOU metrics
        print({
            "end_of_utterance_delay": round(metrics.end_of_utterance_delay, 2),
            "transcription_delay": round(metrics.transcription_delay, 2),
            "speech_id": metrics.speech_id,
        })
3. Key Metrics: • `end_of_utterance_delay`: This tells you the time from the end of speech (VAD) to turn completion, including transcription delay. • `transcription_delay`: This measures the time between the end of speech and when the final transcript becomes available. • `speech_id`: A unique identifier for the user’s turn, helping you keep track. 4. Ensure Turn Detection is Enabled: Just a quick note—EOU metrics will only be emitted when turn_detection is configured (like using VAD or LiveKit's turn detector). For a complete, working example, you can check out the metrics_stt.py file in the LiveKit examples repository. If you have any questions or need a hand with anything else, feel free to ask—I'm here to help!