bulky-actor-93512
09/11/2025, 3:17 PMtall-airline-16756
09/11/2025, 3:17 PMtall-airline-16756
09/11/2025, 3:18 PMsegment_id
field that uniquely identifies audio segments, along with useful metrics like ttfb
(time to first byte) and duration
.
• STTMetrics: Provides request_id
and speech_id
to help correlate transcription results with user utterances.
• EOUMetrics: Captures delays such as transcription_delay
and end_of_utterance_delay
, which are really helpful for understanding timing.
You can subscribe to the metrics_collected
event to log and gather these metrics. For example:
@agent.on("metrics_collected")
def on_metrics_collected(metrics: AgentMetrics):
if isinstance(metrics, TTSMetrics) and metrics.segment_id:
<http://logger.info|logger.info>(f"Segment {metrics.segment_id}: TTFB = {metrics.ttfb}s, Duration = {metrics.duration}s")
To get a full picture across a session, you might want to use the UsageCollector
class or summarize the data manually. For end-to-end latency, combining metrics like end_of_utterance_delay
, ttft
(LLM), and ttfb
(TTS) can be very insightful.
If you’d like, I’m here to walk you through any part of this or help with the implementation. Just let me know!
Sources: livekit-agents/livekit/agents/voice/audio_recognition.py | Logs, metrics, and telemetry | LiveKit Docs | Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docs | Text and transcriptions | LiveKit Docstall-airline-16756
09/11/2025, 3:18 PMTTSMetrics
object includes a segment_id
field, which uniquely identifies a generated audio segment, along with useful metrics like ttfb
(time to first byte) and duration
. Similarly, STTMetrics
and EOUMetrics
offer timing details for transcription and end-of-utterance delays.
To bring these metrics together, you might use the UsageCollector
class or manually collect and summarize the data. For instance, you can subscribe to the metrics_collected
event and then log or process the metrics based on segment_id
or request_id
.
If you'd like, I’m here to provide more details or examples—just let me know, and I’ll gladly guide you through it!bulky-actor-93512
09/11/2025, 3:19 PMtall-airline-16756
09/11/2025, 3:20 PMUsageCollector
is actually designed to aggregate usage data like token counts and API usage across sessions, so it doesn't support per-turn latency metrics directly.
But don't worry! You can still capture latency for each turn by subscribing to the metrics_collected
event and using the speech_id
field to correlate related metrics. This way, you can group together metrics like LLMMetrics
, TTSMetrics
, and EOUMetrics
for every turn.
To get the total latency per turn, simply add up these values:
• end_of_utterance_delay
(EOUMetrics)
• ttft
(time to first token, LLMMetrics)
• ttfb
(time to first byte, TTSMetrics)
If you'd like, I’m here to help with more detailed guidance or examples on how to implement this—just let me know!
Sources: examples/src/cartersia_tts.ts | Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docs | Logs, metrics, and telemetry | LiveKit Docs | tests/test_agent_session.pytall-airline-16756
09/11/2025, 3:20 PMUsageCollector
is mainly designed to aggregate usage data like token counts and API usage across sessions, so it doesn’t support per-turn latency metrics directly.
But don’t worry! You can still capture latency for each turn by subscribing to the metrics_collected
event and using the speech_id
field to correlate the metrics. This way, you can group together important metrics like end_of_utterance_delay
(EOUMetrics), ttft
(LLMMetrics), and ttfb
(TTSMetrics) for every turn.
To find the total latency per turn, you just need to add these up:
• total_latency = end_of_utterance_delay + ttft + ttfb
If you’d like, I’m here to help with more detailed guidance or examples on how to implement this—just let me know!bulky-actor-93512
09/11/2025, 3:21 PMtall-airline-16756
09/11/2025, 3:21 PMspeech_id
field to group related metrics. This field stays consistent across STT, LLM, and TTS events within one user turn, making it a great anchor point.
2. Track Multiple STT Events: When you have multiple STT events for one LLM response, keep a list of STT metrics under the same speech_id
. You can then aggregate these (like summing durations) to get the total STT processing time.
3. Define a Processing Flow: Setting up a clear processing pipeline can really help. For example:
◦ Start by aggregating STT metrics grouped by speech_id
.
◦ Then, connect those aggregated STT metrics to the matching LLM metrics using the same speech_id
.
◦ Finally, link the LLM metrics with the TTS metrics to complete the user turn.
1. Use a Data Structure: Consider storing your metrics in a dictionary or database keyed by speech_id
. For example:`{
"speech_id_1": {
"stt": [list of STT metrics],
"llm": LLM metrics,
"tts": TTS metrics
}
}`
2. Aggregate Metrics: Once everything is neatly organized, you can calculate total latency per turn by adding up the relevant metrics (like STT duration + LLM processing time + TTS time).
This method helps keep all your metrics grouped logically and easy to understand. If you'd like, I’m happy to walk you through a more detailed example or assist further—just let me know!tall-airline-16756
09/11/2025, 3:22 PMspeech_id
field to link related metrics like STT, LLM, and TTS for each user turn. This way, all the metrics for a specific turn stay nicely grouped together.
2. Track Metric Types: Keep separate collections for each metric type (such as STT, LLM, TTS) within the same speech_id
group. This makes it easier to spot and manage multiple STT events for the same turn.
3. Define Aggregation Rules: For situations where you have multiple STT events, decide on a clear strategy to aggregate them—whether that’s picking the longest duration, averaging them, or using the final STT event.
4. Implement a Processing Pipeline: Setting up a pipeline to process metrics as they come in can help organize everything into a structured format. Here’s a simple example:
metrics_store = {}
@session.on("metrics_collected")
def handle_metrics(event):
speech_id = event.metrics.speech_id
if not speech_id:
return
if speech_id not in metrics_store:
metrics_store[speech_id] = {"stt": [], "llm": None, "tts": None}
if isinstance(event.metrics, STTMetrics):
metrics_store[speech_id]["stt"].append(event.metrics)
elif isinstance(event.metrics, LLMMetrics):
metrics_store[speech_id]["llm"] = event.metrics
elif isinstance(event.metrics, TTSMetrics):
metrics_store[speech_id]["tts"] = event.metrics
# Aggregate metrics at the end of the session
def aggregate_metrics():
for speech_id, data in metrics_store.items():
stt_durations = [stt.duration for stt in data["stt"]]
total_stt_duration = sum(stt_durations)
llm_ttft = data["llm"].ttft if data["llm"] else 0
tts_ttfb = data["tts"].ttfb if data["tts"] else 0
total_latency = total_stt_duration + llm_ttft + tts_ttfb
print(f"Speech ID {speech_id}: Total Latency = {total_latency}s")
This method helps keep your metrics organized and aggregated in a clear, systematic way. If anything feels unclear or if you want to dive deeper, just let me know—I’m here to help!bulky-actor-93512
09/11/2025, 7:17 PMtall-airline-16756
09/11/2025, 7:17 PMATTR_START_TIME
and ATTR_END_TIME
help mark the timing boundaries for these operations.
• Word-level Timing: If you enable use_tts_aligned_transcript=True
, you can access TimedString
objects that give you precise start and end times for each word within a turn.
These tools make it easier to track and analyze metrics at both the start and end of a turn. If you'd like, I’m here to provide more details or walk you through some examples—just let me know!
Sources: Logs, metrics, and telemetry | LiveKit Docs | livekit-agents/livekit/agents/telemetry/trace_types.py | Text and transcriptions | LiveKit Docs | Text and transcriptions | LiveKit Docstall-airline-16756
09/11/2025, 7:18 PMATTR_START_TIME
and ATTR_END_TIME
help mark the timing boundaries for these operations.
• Word-level Timing: If you enable use_tts_aligned_transcript=True
, you can access TimedString
objects that give you precise start and end times for each word within a turn.
These tools are designed to help you track and analyze turn metrics smoothly. If you'd like, I can provide more details or examples to guide you through the implementation—just let me know!bulky-actor-93512
09/11/2025, 7:47 PMtall-airline-16756
09/11/2025, 7:48 PMmetrics_collected
event emitted by the AgentSession
. This event shares valuable metrics at important points, including once the agent finishes their turn.
For instance, the EOUMetrics
event is triggered when the agent completes processing and includes details like:
• `end_of_utterance_delay`: The time from when the user stops speaking until the turn is completed.
• `transcription_delay`: How long it takes from the end of speech to when the final transcript is ready.
• `on_user_turn_completed_delay`: The duration to run the on_user_turn_completed
callback.
Here’s a simple example to illustrate this:
from livekit.agents import metrics, MetricsCollectedEvent
@session.on("metrics_collected")
def on_metrics_collected(event: MetricsCollectedEvent):
if event.metrics.type == "eou_metrics":
print(f"EOU Delay: {event.metrics.end_of_utterance_delay}")
print(f"Transcription Delay: {event.metrics.transcription_delay}")
print(f"Callback Delay: {event.metrics.on_user_turn_completed_delay}")
If you’d like, I’m here to help with any further questions or provide more examples—just let me know!
Sources: Capturing metrics | LiveKit Docs | tests/test_agent_session.py | Logs, metrics, and telemetry | LiveKit Docstall-airline-16756
09/11/2025, 7:48 PMmetrics_collected
event emitted by the AgentSession
. This event shares valuable metrics at important stages, including right after the agent's turn ends.
For instance, the EOUMetrics
event is emitted when the agent finishes processing and includes:
• `end_of_utterance_delay`: The time from when the user stops speaking to when the turn is completed.
• `transcription_delay`: The time between the end of speech and when the final transcript becomes available.
• `on_user_turn_completed_delay`: The time it takes to execute the on_user_turn_completed
callback.
Here’s a simple example to get you started:
from livekit.agents import metrics, MetricsCollectedEvent
@session.on("metrics_collected")
def on_metrics_collected(event: MetricsCollectedEvent):
if event.metrics.type == "eou_metrics":
print(f"EOU Delay: {event.metrics.end_of_utterance_delay}")
print(f"Transcription Delay: {event.metrics.transcription_delay}")
print(f"Callback Delay: {event.metrics.on_user_turn_completed_delay}")
If you have any questions or would like me to walk you through more examples, just let me know—I’m here to help!bulky-actor-93512
09/11/2025, 8:04 PMtall-airline-16756
09/11/2025, 8:05 PMagent_started_speaking
event like this:
@agent.on("agent_started_speaking")
def agent_started_speaking():
print("Agent started speaking")
For LiveKit Agents v1.x, this event was updated to agent_state_changed
. You can listen for when the state changes to "speaking"
like so:
@session.on("agent_state_changed")
def on_agent_state_changed(ev: AgentStateChangedEvent):
if ev.new_state == "speaking":
print("Agent started speaking")
If anything feels unclear or if you’d like me to walk you through it step-by-step, please don’t hesitate to ask. I’m here to help!
Sources: VoicePipelineAgent | LiveKit Docs | VoicePipelineAgent | LiveKit Docs | Working with the MultimodalAgent class | LiveKit Docs | Working with the MultimodalAgent class | LiveKit Docs | Agents 0.x migration guide | LiveKit Docsbulky-actor-93512
09/11/2025, 9:52 PMtall-airline-16756
09/11/2025, 9:53 PMagent_started_speaking
and agent_state_changed
events don’t include latency metrics. They simply let you know when the agent starts speaking or changes state, but they don’t provide detailed timing information.
If you’re looking for latency metrics, the metrics_collected
event is the way to go. It gives you detailed measurements like:
• ttft
(Time to First Token)
• ttfb
(Time to First Byte)
• end_of_utterance_delay
• transcription_delay
If you’d like, I’m here to help you set up metrics_collected
so you can capture all these details smoothly. Just let me know!
Sources: VoicePipelineAgent | LiveKit Docs | Events and error handling | LiveKit Docs | livekit-agents/livekit/agents/voice/events.py | Logs, metrics, and telemetry | LiveKit Docs