able-leather-48732
07/11/2025, 5:49 PMtall-airline-16756
07/11/2025, 5:49 PMtall-airline-16756
07/11/2025, 5:50 PMtotal_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
This represents the time from when the user finishes speaking to when the agent begins responding with audio. The key sources of latency are:
• End-of-utterance (EOU) delay: The system's time to detect the end of the user's speech.
• LLM Time To First Token (TTFT): How quickly your large language model starts generating a response.
• TTS Time To First Byte (TTFB): How quickly the first audio byte is received from the Text-to-Speech service.
You can measure these using LiveKit's Agents metrics system. Integrate the metrics API to capture event-level data, or use the UsageCollector
class for aggregated session-level metrics. Focus on monitoring TTFT and TTFB as key latency indicators. For detailed code examples and optimization strategies, the knowledge base recommends positioning your agent close to your STT, LLM, and TTS services, and regularly monitoring with the Agent metrics API to identify bottlenecks.
If you need implementation examples or guidance on integrating metrics into your code, let me know which programming language and agent setup you're using.
Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | README.md | Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docs | AI Agents Framework | livekit/livekit_composite | DeepWikiable-leather-48732
07/14/2025, 1:34 PMtall-airline-16756
07/14/2025, 1:36 PMend_of_utterance_delay
, llm.ttft
, and tts.ttfb
, and then add them up to see the total response time.
import logging
import asyncio
from livekit.agents import Agent
from livekit.agents.metrics import EOUMetrics, LLMMetrics, TTSMetrics
from livekit.plugins import deepgram, openai, silero
logger = logging.getLogger("latency-tracker")
logger.setLevel(<http://logging.INFO|logging.INFO>)
class LatencyTrackingAgent(Agent):
def __init__(self):
llm = openai.LLM(model="gpt-4o-mini")
stt = deepgram.STT()
tts = openai.TTS()
vad = silero.VAD.load()
super().__init__(instructions="You are a helpful assistant that demonstrates latency tracking.", stt=stt, llm=llm, tts=tts, vad=vad)
self.metrics = {'end_of_utterance_delay': None, 'llm_ttft': None, 'tts_ttfb': None}
self.stt.on("eou_metrics_collected", lambda m: asyncio.create_task(self.on_eou(m)))
self.llm.on("metrics_collected", lambda m: asyncio.create_task(self.on_llm(m)))
self.tts.on("metrics_collected", lambda m: asyncio.create_task(self.on_tts(m)))
async def on_eou(self, metrics: EOUMetrics):
self.metrics['end_of_utterance_delay'] = metrics.end_of_utterance_delay
await self.try_log_total_latency()
async def on_llm(self, metrics: LLMMetrics):
self.metrics['llm_ttft'] = metrics.ttft
await self.try_log_total_latency()
async def on_tts(self, metrics: TTSMetrics):
self.metrics['tts_ttfb'] = metrics.ttfb
await self.try_log_total_latency()
async def try_log_total_latency(self):
if all(v is not None for v in self.metrics.values()):
total_latency = sum(self.metrics.values())
<http://logger.info|logger.info>(f"Total Latency: {total_latency:.3f}s (EOU: {self.metrics['end_of_utterance_delay']:.3f}, LLM TTFT: {self.metrics['llm_ttft']:.3f}, TTS TTFB: {self.metrics['tts_ttfb']:.3f})")
self.metrics = {'end_of_utterance_delay': None, 'llm_ttft': None, 'tts_ttfb': None}
• We'll set up some smart event handlers to catch metrics from different parts of the AI (speech recognition, language model, and text-to-speech).
• As soon as we've measured everything, we'll log a friendly breakdown of how fast each part is working.
Want to dive deeper? LiveKit has even more tools like the UsageCollector
to help you track performance over time. Pretty neat, right?
Sources: Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.pyable-leather-48732
07/14/2025, 1:45 PM@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
try:
# Log the raw metrics object for debugging
logger.debug(f"📊 Raw metrics object: {ev.metrics}")
metrics = ev.metrics
total_latency = (
metrics.end_of_utterance_delay + metrics.ttft + metrics.ttfb
)
<http://logger.info|logger.info>(f"current total latency - {total_latency}")
metrics.log_metrics(metrics=ev.metrics, logger=logger)
except Exception as e:
logger.error(f"❌ Error processing metrics: {e}", exc_info=True)
# ───────────── Start media pipeline ──────────
await session.start(
room=ctx.room,
agent=assistant,
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVC()
),
)
tall-airline-16756
07/14/2025, 1:46 PMmetrics_collected
event is a bit complex - it actually fires with separate metric objects for each pipeline stage. So end_of_utterance_delay
, ttft
, and ttfb
aren't always showing up together. No worries, though - I've got a solution!
• EOU (End-of-utterance) metrics have end_of_utterance_delay
• LLM metrics have ttft
(time to first token)
• TTS metrics have ttfb
(time to first byte)
Right now, this means your handler might throw an AttributeError
since one or more of these fields could be missing. Let me walk you through a better approach.
Here's what you'll want to do:
1. Capture and store each metric type's values using a unique identifier (like speech_id
or turn) as the separate events roll in.
2. Once you've got all three components for a turn, go ahead and sum them to calculate and log the total latency.
If you want more details, the official recommendations have a great implementation pattern for matching and aggregating metrics by turn. You've got this!
Sources: Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.py