how can i calculate the total latency for my voice...
# ask-ai
a
how can i calculate the total latency for my voice ai agent in livekt
t
Thanks for reaching out, maleek. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
You can calculate the total latency for your voice AI agent in LiveKit using the following formula:
Copy code
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
This represents the time from when the user finishes speaking to when the agent begins responding with audio. The key sources of latency are: • End-of-utterance (EOU) delay: The system's time to detect the end of the user's speech. • LLM Time To First Token (TTFT): How quickly your large language model starts generating a response. • TTS Time To First Byte (TTFB): How quickly the first audio byte is received from the Text-to-Speech service. You can measure these using LiveKit's Agents metrics system. Integrate the metrics API to capture event-level data, or use the
UsageCollector
class for aggregated session-level metrics. Focus on monitoring TTFT and TTFB as key latency indicators. For detailed code examples and optimization strategies, the knowledge base recommends positioning your agent close to your STT, LLM, and TTS services, and regularly monitoring with the Agent metrics API to identify bottlenecks. If you need implementation examples or guidance on integrating metrics into your code, let me know which programming language and agent setup you're using. Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | README.md | Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docs | AI Agents Framework | livekit/livekit_composite | DeepWiki
a
give me example, how can i implement this in my python file?
t
Hey there! Let me walk you through a super cool way to track latency for your LiveKit Voice AI agent in Python. Think of this like keeping a friendly performance scorecard that helps you understand how quickly your AI is responding. We'll track some key timings that show how snappy your agent is:
end_of_utterance_delay
,
llm.ttft
, and
tts.ttfb
, and then add them up to see the total response time.
Copy code
import logging
import asyncio
from livekit.agents import Agent
from livekit.agents.metrics import EOUMetrics, LLMMetrics, TTSMetrics
from livekit.plugins import deepgram, openai, silero

logger = logging.getLogger("latency-tracker")
logger.setLevel(<http://logging.INFO|logging.INFO>)

class LatencyTrackingAgent(Agent):
    def __init__(self):
        llm = openai.LLM(model="gpt-4o-mini")
        stt = deepgram.STT()
        tts = openai.TTS()
        vad = silero.VAD.load()
        super().__init__(instructions="You are a helpful assistant that demonstrates latency tracking.", stt=stt, llm=llm, tts=tts, vad=vad)
        self.metrics = {'end_of_utterance_delay': None, 'llm_ttft': None, 'tts_ttfb': None}
        self.stt.on("eou_metrics_collected", lambda m: asyncio.create_task(self.on_eou(m)))
        self.llm.on("metrics_collected", lambda m: asyncio.create_task(self.on_llm(m)))
        self.tts.on("metrics_collected", lambda m: asyncio.create_task(self.on_tts(m)))

    async def on_eou(self, metrics: EOUMetrics):
        self.metrics['end_of_utterance_delay'] = metrics.end_of_utterance_delay
        await self.try_log_total_latency()

    async def on_llm(self, metrics: LLMMetrics):
        self.metrics['llm_ttft'] = metrics.ttft
        await self.try_log_total_latency()

    async def on_tts(self, metrics: TTSMetrics):
        self.metrics['tts_ttfb'] = metrics.ttfb
        await self.try_log_total_latency()

    async def try_log_total_latency(self):
        if all(v is not None for v in self.metrics.values()):
            total_latency = sum(self.metrics.values())
            <http://logger.info|logger.info>(f"Total Latency: {total_latency:.3f}s (EOU: {self.metrics['end_of_utterance_delay']:.3f}, LLM TTFT: {self.metrics['llm_ttft']:.3f}, TTS TTFB: {self.metrics['tts_ttfb']:.3f})")
            self.metrics = {'end_of_utterance_delay': None, 'llm_ttft': None, 'tts_ttfb': None}
• We'll set up some smart event handlers to catch metrics from different parts of the AI (speech recognition, language model, and text-to-speech). • As soon as we've measured everything, we'll log a friendly breakdown of how fast each part is working. Want to dive deeper? LiveKit has even more tools like the
UsageCollector
to help you track performance over time. Pretty neat, right? Sources: Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.py
a
so will this code work?
Copy code
@session.on("metrics_collected")
        def _on_metrics_collected(ev: MetricsCollectedEvent):
            try:
                # Log the raw metrics object for debugging
                logger.debug(f"📊 Raw metrics object: {ev.metrics}")
                
                metrics = ev.metrics
                total_latency = (
                    metrics.end_of_utterance_delay + metrics.ttft + metrics.ttfb
                )
                <http://logger.info|logger.info>(f"current total latency - {total_latency}")
                metrics.log_metrics(metrics=ev.metrics, logger=logger)

            except Exception as e:
                logger.error(f"❌ Error processing metrics: {e}", exc_info=True)

        # ───────────── Start media pipeline ──────────
        await session.start(
            room=ctx.room,
            agent=assistant,
            room_input_options=RoomInputOptions(
                noise_cancellation=noise_cancellation.BVC()
            ),
        )
t
Hey there! I totally get what you're experiencing, and I want to help you navigate this tricky metrics situation. Your current code might run into some reliability issues because the
metrics_collected
event is a bit complex - it actually fires with separate metric objects for each pipeline stage. So
end_of_utterance_delay
,
ttft
, and
ttfb
aren't always showing up together. No worries, though - I've got a solution! • EOU (End-of-utterance) metrics have
end_of_utterance_delay
LLM metrics have
ttft
(time to first token) • TTS metrics have
ttfb
(time to first byte) Right now, this means your handler might throw an
AttributeError
since one or more of these fields could be missing. Let me walk you through a better approach. Here's what you'll want to do: 1. Capture and store each metric type's values using a unique identifier (like
speech_id
or turn) as the separate events roll in. 2. Once you've got all three components for a turn, go ahead and sum them to calculate and log the total latency. If you want more details, the official recommendations have a great implementation pattern for matching and aggregating metrics by turn. You've got this! Sources: Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.py