how can i calculate the total latency for my voice ai agent LiveKit Community #ask-ai

Join Slack

how can i calculate the total latency for my voice...

# ask-ai

able-leather-48732

07/11/2025, 5:49 PM

how can i calculate the total latency for my voice ai agent in livekt

tall-airline-16756

07/11/2025, 5:49 PM

Thanks for reaching out, maleek. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/11/2025, 5:50 PM

You can calculate the total latency for your voice AI agent in LiveKit using the following formula:

Copy code

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb

This represents the time from when the user finishes speaking to when the agent begins responding with audio. The key sources of latency are: • End-of-utterance (EOU) delay: The system's time to detect the end of the user's speech. • LLM Time To First Token (TTFT): How quickly your large language model starts generating a response. • TTS Time To First Byte (TTFB): How quickly the first audio byte is received from the Text-to-Speech service. You can measure these using LiveKit's Agents metrics system. Integrate the metrics API to capture event-level data, or use the

UsageCollector

class for aggregated session-level metrics. Focus on monitoring TTFT and TTFB as key latency indicators. For detailed code examples and optimization strategies, the knowledge base recommends positioning your agent close to your STT, LLM, and TTS services, and regularly monitoring with the Agent metrics API to identify bottlenecks. If you need implementation examples or guidance on integrating metrics into your code, let me know which programming language and agent setup you're using. Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | README.md | Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docs | AI Agents Framework | livekit/livekit_composite | DeepWiki

able-leather-48732

07/14/2025, 1:34 PM

give me example, how can i implement this in my python file?

tall-airline-16756

07/14/2025, 1:36 PM

Hey there! Let me walk you through a super cool way to track latency for your LiveKit Voice AI agent in Python. Think of this like keeping a friendly performance scorecard that helps you understand how quickly your AI is responding. We'll track some key timings that show how snappy your agent is:

end_of_utterance_delay

llm.ttft

, and

tts.ttfb

, and then add them up to see the total response time.

Copy code

import logging
import asyncio
from livekit.agents import Agent
from livekit.agents.metrics import EOUMetrics, LLMMetrics, TTSMetrics
from livekit.plugins import deepgram, openai, silero

logger = logging.getLogger("latency-tracker")
logger.setLevel(<http://logging.INFO|logging.INFO>)

class LatencyTrackingAgent(Agent):
    def __init__(self):
        llm = openai.LLM(model="gpt-4o-mini")
        stt = deepgram.STT()
        tts = openai.TTS()
        vad = silero.VAD.load()
        super().__init__(instructions="You are a helpful assistant that demonstrates latency tracking.", stt=stt, llm=llm, tts=tts, vad=vad)
        self.metrics = {'end_of_utterance_delay': None, 'llm_ttft': None, 'tts_ttfb': None}
        self.stt.on("eou_metrics_collected", lambda m: asyncio.create_task(self.on_eou(m)))
        self.llm.on("metrics_collected", lambda m: asyncio.create_task(self.on_llm(m)))
        self.tts.on("metrics_collected", lambda m: asyncio.create_task(self.on_tts(m)))

    async def on_eou(self, metrics: EOUMetrics):
        self.metrics['end_of_utterance_delay'] = metrics.end_of_utterance_delay
        await self.try_log_total_latency()

    async def on_llm(self, metrics: LLMMetrics):
        self.metrics['llm_ttft'] = metrics.ttft
        await self.try_log_total_latency()

    async def on_tts(self, metrics: TTSMetrics):
        self.metrics['tts_ttfb'] = metrics.ttfb
        await self.try_log_total_latency()

    async def try_log_total_latency(self):
        if all(v is not None for v in self.metrics.values()):
            total_latency = sum(self.metrics.values())
            <http://logger.info|logger.info>(f"Total Latency: {total_latency:.3f}s (EOU: {self.metrics['end_of_utterance_delay']:.3f}, LLM TTFT: {self.metrics['llm_ttft']:.3f}, TTS TTFB: {self.metrics['tts_ttfb']:.3f})")
            self.metrics = {'end_of_utterance_delay': None, 'llm_ttft': None, 'tts_ttfb': None}

• We'll set up some smart event handlers to catch metrics from different parts of the AI (speech recognition, language model, and text-to-speech). • As soon as we've measured everything, we'll log a friendly breakdown of how fast each part is working. Want to dive deeper? LiveKit has even more tools like the

UsageCollector

to help you track performance over time. Pretty neat, right? Sources: Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.py

able-leather-48732

07/14/2025, 1:45 PM

so will this code work?

Copy code

@session.on("metrics_collected")
        def _on_metrics_collected(ev: MetricsCollectedEvent):
            try:
                # Log the raw metrics object for debugging
                logger.debug(f"📊 Raw metrics object: {ev.metrics}")
                
                metrics = ev.metrics
                total_latency = (
                    metrics.end_of_utterance_delay + metrics.ttft + metrics.ttfb
                )
                <http://logger.info|logger.info>(f"current total latency - {total_latency}")
                metrics.log_metrics(metrics=ev.metrics, logger=logger)

            except Exception as e:
                logger.error(f"❌ Error processing metrics: {e}", exc_info=True)

        # ───────────── Start media pipeline ──────────
        await session.start(
            room=ctx.room,
            agent=assistant,
            room_input_options=RoomInputOptions(
                noise_cancellation=noise_cancellation.BVC()
            ),
        )

tall-airline-16756

07/14/2025, 1:46 PM

Hey there! I totally get what you're experiencing, and I want to help you navigate this tricky metrics situation. Your current code might run into some reliability issues because the

metrics_collected

event is a bit complex - it actually fires with separate metric objects for each pipeline stage. So end_of_utterance_delay
, ttft
, and ttfb
aren't always showing up together. No worries, though - I've got a solution! • EOU (End-of-utterance) metrics have

end_of_utterance_delay

• LLM metrics have

ttft

(time to first token) • TTS metrics have

ttfb

(time to first byte) Right now, this means your handler might throw an

AttributeError

since one or more of these fields could be missing. Let me walk you through a better approach. Here's what you'll want to do: 1. Capture and store each metric type's values using a unique identifier (like

speech_id

or turn) as the separate events roll in. 2. Once you've got all three components for a turn, go ahead and sum them to calculate and log the total latency. If you want more details, the official recommendations have a great implementation pattern for matching and aggregating metrics by turn. You've got this! Sources: Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.py

Open in Slack

Previous Next