Is there any way to retrieve the latency metrics per transcr LiveKit Community #ask-ai

Join Slack

Is there any way to retrieve the latency metrics p...

# ask-ai

bulky-actor-93512

09/11/2025, 3:17 PM

Is there any way to retrieve the latency metrics per transcript segment and for it to be aggregated and organized?

tall-airline-16756

09/11/2025, 3:17 PM

Thanks for reaching out, Sebastian. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/11/2025, 3:18 PM

I'm happy to help you retrieve latency metrics per transcript segment by using the metrics events emitted during agent sessions. Here's a quick overview: • TTSMetrics: This includes a

segment_id

field that uniquely identifies audio segments, along with useful metrics like

ttfb

(time to first byte) and

duration

. • STTMetrics: Provides

request_id

and

speech_id

to help correlate transcription results with user utterances. • EOUMetrics: Captures delays such as

transcription_delay

and

end_of_utterance_delay

, which are really helpful for understanding timing. You can subscribe to the

metrics_collected

event to log and gather these metrics. For example:

Copy code

@agent.on("metrics_collected")
def on_metrics_collected(metrics: AgentMetrics):
    if isinstance(metrics, TTSMetrics) and metrics.segment_id:
        <http://logger.info|logger.info>(f"Segment {metrics.segment_id}: TTFB = {metrics.ttfb}s, Duration = {metrics.duration}s")

To get a full picture across a session, you might want to use the

UsageCollector

class or summarize the data manually. For end-to-end latency, combining metrics like

end_of_utterance_delay

ttft

(LLM), and

ttfb

tall-airline-16756

09/11/2025, 3:18 PM

I'm happy to help you retrieve latency metrics per transcript segment! You can do this by using the metrics events emitted during agent sessions. Each

TTSMetrics

object includes a

segment_id

field, which uniquely identifies a generated audio segment, along with useful metrics like

ttfb

(time to first byte) and

duration

. Similarly,

STTMetrics

and

EOUMetrics

offer timing details for transcription and end-of-utterance delays. To bring these metrics together, you might use the

UsageCollector

class or manually collect and summarize the data. For instance, you can subscribe to the

metrics_collected

event and then log or process the metrics based on

segment_id

request_id

. If you'd like, I’m here to provide more details or examples—just let me know, and I’ll gladly guide you through it!

bulky-actor-93512

09/11/2025, 3:19 PM

does the usage collector let me see the latency metrics per turn the agent spoke?

tall-airline-16756

09/11/2025, 3:20 PM

I understand you're looking to view latency metrics per agent turn. The

UsageCollector

is actually designed to aggregate usage data like token counts and API usage across sessions, so it doesn't support per-turn latency metrics directly. But don't worry! You can still capture latency for each turn by subscribing to the

metrics_collected

event and using the

speech_id

field to correlate related metrics. This way, you can group together metrics like

LLMMetrics

TTSMetrics

, and

EOUMetrics

for every turn. To get the total latency per turn, simply add up these values: •

end_of_utterance_delay

(EOUMetrics) •

ttft

(time to first token, LLMMetrics) •

ttfb

tall-airline-16756

09/11/2025, 3:20 PM

I understand you're looking to view latency metrics per agent turn. The

UsageCollector

is mainly designed to aggregate usage data like token counts and API usage across sessions, so it doesn’t support per-turn latency metrics directly. But don’t worry! You can still capture latency for each turn by subscribing to the

metrics_collected

event and using the

speech_id

field to correlate the metrics. This way, you can group together important metrics like

end_of_utterance_delay

(EOUMetrics),

ttft

(LLMMetrics), and

ttfb

(TTSMetrics) for every turn. To find the total latency per turn, you just need to add these up: •

total_latency = end_of_utterance_delay + ttft + ttfb

If you’d like, I’m here to help with more detailed guidance or examples on how to implement this—just let me know!

bulky-actor-93512

09/11/2025, 3:21 PM

My issue with this is that the metrics are really unorganized and they don't follow a flow where it's obvious how to aggregate them. You can have 4 speech-to-text for one LLM at least from what I saw, it's very unorganized. How would you approach that?

tall-airline-16756

09/11/2025, 3:21 PM

You're absolutely right—metrics can sometimes feel a bit scattered, especially when multiple STT events relate to a single LLM response. Let's work together to bring some clarity by using a structured approach to correlate and aggregate these metrics effectively: 1. Group Metrics by Session and Speech ID: Try using the

speech_id

field to group related metrics. This field stays consistent across STT, LLM, and TTS events within one user turn, making it a great anchor point. 2. Track Multiple STT Events: When you have multiple STT events for one LLM response, keep a list of STT metrics under the same

speech_id

. You can then aggregate these (like summing durations) to get the total STT processing time. 3. Define a Processing Flow: Setting up a clear processing pipeline can really help. For example: ◦ Start by aggregating STT metrics grouped by

speech_id

. ◦ Then, connect those aggregated STT metrics to the matching LLM metrics using the same

speech_id

. ◦ Finally, link the LLM metrics with the TTS metrics to complete the user turn. 1. Use a Data Structure: Consider storing your metrics in a dictionary or database keyed by

speech_id

. For example:`{ "speech_id_1": { "stt": [list of STT metrics], "llm": LLM metrics, "tts": TTS metrics } }` 2. Aggregate Metrics: Once everything is neatly organized, you can calculate total latency per turn by adding up the relevant metrics (like STT duration + LLM processing time + TTS time). This method helps keep all your metrics grouped logically and easy to understand. If you'd like, I’m happy to walk you through a more detailed example or assist further—just let me know!

tall-airline-16756

09/11/2025, 3:22 PM

You’re absolutely right that having multiple STT events for a single LLM can make aggregating metrics a bit tricky. But don’t worry, there’s a way to handle this smoothly by organizing and correlating your metrics thoughtfully. Here’s a friendly approach you might find helpful: 1. Group Metrics by Session and Speech ID: Try using the

speech_id

field to link related metrics like STT, LLM, and TTS for each user turn. This way, all the metrics for a specific turn stay nicely grouped together. 2. Track Metric Types: Keep separate collections for each metric type (such as STT, LLM, TTS) within the same

speech_id

group. This makes it easier to spot and manage multiple STT events for the same turn. 3. Define Aggregation Rules: For situations where you have multiple STT events, decide on a clear strategy to aggregate them—whether that’s picking the longest duration, averaging them, or using the final STT event. 4. Implement a Processing Pipeline: Setting up a pipeline to process metrics as they come in can help organize everything into a structured format. Here’s a simple example:

Copy code

metrics_store = {}

@session.on("metrics_collected")
def handle_metrics(event):
    speech_id = event.metrics.speech_id
    if not speech_id:
        return

    if speech_id not in metrics_store:
        metrics_store[speech_id] = {"stt": [], "llm": None, "tts": None}

    if isinstance(event.metrics, STTMetrics):
        metrics_store[speech_id]["stt"].append(event.metrics)
    elif isinstance(event.metrics, LLMMetrics):
        metrics_store[speech_id]["llm"] = event.metrics
    elif isinstance(event.metrics, TTSMetrics):
        metrics_store[speech_id]["tts"] = event.metrics

# Aggregate metrics at the end of the session
def aggregate_metrics():
    for speech_id, data in metrics_store.items():
        stt_durations = [stt.duration for stt in data["stt"]]
        total_stt_duration = sum(stt_durations)
        llm_ttft = data["llm"].ttft if data["llm"] else 0
        tts_ttfb = data["tts"].ttfb if data["tts"] else 0
        total_latency = total_stt_duration + llm_ttft + tts_ttfb
        print(f"Speech ID {speech_id}: Total Latency = {total_latency}s")

This method helps keep your metrics organized and aggregated in a clear, systematic way. If anything feels unclear or if you want to dive deeper, just let me know—I’m here to help!

bulky-actor-93512

09/11/2025, 7:17 PM

Maybe we could retrieve the metrics on the age and end turn. Does the session provide any way to retrieve this?

tall-airline-16756

09/11/2025, 7:17 PM

I'm happy to help you retrieve metrics at the start and end of a turn! Here are some useful methods you can use: • End-of-utterance (EOU) Metrics: These are provided when the user finishes speaking and include: ◦ `end_of_utterance_delay`: This tells you the time from the end of speech to when the turn completes. ◦ `transcription_delay`: This shows the time between the end of speech and when the final transcript becomes available. ◦ `speech_id`: A unique identifier for the turn, which is great for correlating different events. • OpenTelemetry Tracing: This captures detailed trace-level data for session start, agent turns, and end-of-turn detection. Attributes like

ATTR_START_TIME

and

ATTR_END_TIME

help mark the timing boundaries for these operations. • Word-level Timing: If you enable

use_tts_aligned_transcript=True

, you can access

TimedString

objects that give you precise start and end times for each word within a turn. These tools make it easier to track and analyze metrics at both the start and end of a turn. If you'd like, I’m here to provide more details or walk you through some examples—just let me know! Sources: Logs, metrics, and telemetry | LiveKit Docs | livekit-agents/livekit/agents/telemetry/trace_types.py | Text and transcriptions | LiveKit Docs | Text and transcriptions | LiveKit Docs

tall-airline-16756

09/11/2025, 7:18 PM

I'm happy to help you retrieve metrics at the start and end of a turn! Here are some methods you can use: • End-of-utterance (EOU) Metrics: These are generated right when the user finishes speaking and include: ◦ `end_of_utterance_delay`: This tells you the time from when the speech ends to the turn completion. ◦ `transcription_delay`: This shows the time between the end of speech and when the final transcript becomes available. ◦ `speech_id`: A unique identifier that helps you correlate events within a turn. • OpenTelemetry Tracing: This is great for capturing detailed trace-level data for session starts, agent turns, and end-of-turn detection. Attributes like

ATTR_START_TIME

and

ATTR_END_TIME

help mark the timing boundaries for these operations. • Word-level Timing: If you enable

use_tts_aligned_transcript=True

, you can access

TimedString

objects that give you precise start and end times for each word within a turn. These tools are designed to help you track and analyze turn metrics smoothly. If you'd like, I can provide more details or examples to guide you through the implementation—just let me know!

bulky-actor-93512

09/11/2025, 7:47 PM

i meant once the agent turn iss done sorry.

tall-airline-16756

09/11/2025, 7:48 PM

To help you retrieve metrics right after the agent's turn is completed, you can simply subscribe to the

metrics_collected

event emitted by the

AgentSession

. This event shares valuable metrics at important points, including once the agent finishes their turn. For instance, the

EOUMetrics

event is triggered when the agent completes processing and includes details like: • `end_of_utterance_delay`: The time from when the user stops speaking until the turn is completed. • `transcription_delay`: How long it takes from the end of speech to when the final transcript is ready. • `on_user_turn_completed_delay`: The duration to run the

on_user_turn_completed

callback. Here’s a simple example to illustrate this:

Copy code

from livekit.agents import metrics, MetricsCollectedEvent

@session.on("metrics_collected")
def on_metrics_collected(event: MetricsCollectedEvent):
    if event.metrics.type == "eou_metrics":
        print(f"EOU Delay: {event.metrics.end_of_utterance_delay}")
        print(f"Transcription Delay: {event.metrics.transcription_delay}")
        print(f"Callback Delay: {event.metrics.on_user_turn_completed_delay}")

If you’d like, I’m here to help with any further questions or provide more examples—just let me know! Sources: Capturing metrics | LiveKit Docs | tests/test_agent_session.py | Logs, metrics, and telemetry | LiveKit Docs

tall-airline-16756

09/11/2025, 7:48 PM

To help you retrieve metrics once the agent's turn is completed, you can subscribe to the

metrics_collected

event emitted by the

AgentSession

. This event shares valuable metrics at important stages, including right after the agent's turn ends. For instance, the

EOUMetrics

event is emitted when the agent finishes processing and includes: • `end_of_utterance_delay`: The time from when the user stops speaking to when the turn is completed. • `transcription_delay`: The time between the end of speech and when the final transcript becomes available. • `on_user_turn_completed_delay`: The time it takes to execute the

on_user_turn_completed

callback. Here’s a simple example to get you started:

Copy code

from livekit.agents import metrics, MetricsCollectedEvent

@session.on("metrics_collected")
def on_metrics_collected(event: MetricsCollectedEvent):
    if event.metrics.type == "eou_metrics":
        print(f"EOU Delay: {event.metrics.end_of_utterance_delay}")
        print(f"Transcription Delay: {event.metrics.transcription_delay}")
        print(f"Callback Delay: {event.metrics.on_user_turn_completed_delay}")

If you have any questions or would like me to walk you through more examples, just let me know—I’m here to help!

bulky-actor-93512

09/11/2025, 8:04 PM

i mean like here does it have the data? agent_started_speaking

tall-airline-16756

09/11/2025, 8:05 PM

I'm happy to help you with this! Yes, LiveKit does provide a way to detect when the agent starts speaking. In LiveKit Agents v0.x, you can use the

agent_started_speaking

event like this:

Copy code

@agent.on("agent_started_speaking")
def agent_started_speaking():
    print("Agent started speaking")

For LiveKit Agents v1.x, this event was updated to

agent_state_changed

. You can listen for when the state changes to

"speaking"

like so:

Copy code

@session.on("agent_state_changed")
def on_agent_state_changed(ev: AgentStateChangedEvent):
    if ev.new_state == "speaking":
        print("Agent started speaking")

bulky-actor-93512

09/11/2025, 9:52 PM

cool but does this have latency metrics?

tall-airline-16756

09/11/2025, 9:53 PM

Thanks for your question! The

agent_started_speaking

and

agent_state_changed

events don’t include latency metrics. They simply let you know when the agent starts speaking or changes state, but they don’t provide detailed timing information. If you’re looking for latency metrics, the

metrics_collected

event is the way to go. It gives you detailed measurements like: •

ttft

(Time to First Token) •

ttfb

(Time to First Byte) •

end_of_utterance_delay

•

transcription_delay

If you’d like, I’m here to help you set up

metrics_collected

Open in Slack

Previous Next