How can i get the latency metrics of a call
# ask-ai
b
How can i get the latency metrics of a call
t
Thanks for reaching out, Sebastian. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I'm glad you're looking into LiveKit's latency metrics! We offer several ways to access these, like through our client SDKs, the Cloud analytics dashboard, and special tools for AI agents or embedded solutions. If you could share a bit about your tech stack or which method you'd like to use (for example, Android, iOS, Web, or Cloud analytics), I'd be happy to guide you more specifically. Looking forward to helping you out! Sources: Sources/LiveKit/Track/Metrics/MetricsManager.swift | livekit-android-sdk/src/main/java/io/livekit/android/room/metrics/RTCMetric... | Say hello to improved Cloud analytics | livekit/livekit_models.pb.go | Logs, metrics, and telemetry | LiveKit Docs
b
through python code
t
Hi there! To help you get latency metrics using Python code in LiveKit, I’ve put together a simple guide that walks you through the process step-by-step. Let’s dive in together: 1. Import Required Modules: First, you’ll want to bring in the
metrics
module along with the specific metric types from
livekit.agents
.
Copy code
from livekit.agents import metrics
from livekit.agents.metrics import LLMMetrics, STTMetrics, TTSMetrics, EOUMetrics
2. Subscribe to Metrics Events: Next, use the
@agent.on("metrics_collected")
decorator to listen for metrics during your agent session.
Copy code
@agent.on("metrics_collected")
def on_metrics_collected(mtrcs: metrics.AgentMetrics):
    metrics.log_metrics(mtrcs)
3. Access Specific Latency Metrics: Here, you can pull out important latency details like Time to First Token (TTFT), Time to First Byte (TTFB), and End-of-Utterance Delay.
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
4. Aggregate Metrics: To get a fuller picture, use the
UsageCollector
class to gather metrics across multiple turns.
Copy code
usage_collector = metrics.UsageCollector()
@agent.on("metrics_collected")
def on_metrics_collected(mtrcs: metrics.AgentMetrics):
    usage_collector.collect(mtrcs)
5. Send Metrics to External Tools: Finally, if you want to forward these metrics to a monitoring tool or dashboard, you can do so using
requests
.
Copy code
import requests
async def send_to_dashboard(metric_type: str, data: dict):
    await loop.run_in_executor(None, lambda: <http://requests.post|requests.post>(f"{METRICS_SERVER_URL}/metrics/{metric_type}", json=data))
If you’d like to see more detailed examples or how everything fits together, I highly recommend checking out the LiveKit Python Agents Examples repository. It’s a great resource to help you along the way!
b
Awesome, one more thing: Is there a way to have the aggregated data really organized? What I mean by this is I want the latency per agent's turn. The metrics right now are very unorganized, and I notice it's not organized in a way I need to be able to see the latency per message the agent does. Is there any way to do this? Maybe the usage collector has something? But I don't want only a summary at the end. I want a per-agent message latency.
t
To help you organize latency metrics per agent turn in LiveKit, you can use the
speech_id
field to connect metrics from STT, LLM, TTS, and EOU components. Here’s a friendly guide on how to do this: 1. Capture Metrics by Turn: Try using event listeners to gather metrics for each component and group them by
speech_id
.
class LatencyTrackingAgent:\n    def __init__(self):\n        self.turn_metrics = {}\n\n    def _on_llm_metrics(self, m):\n        sid = m.speech_id\n        if sid not in self.turn_metrics:\n            self.turn_metrics[sid] = {}\n        self.turn_metrics[sid][\"llm_ttft\"] = m.ttft\n        self.turn_metrics[sid][\"llm_duration\"] = m.duration\n\n    def _on_tts_metrics(self, m):\n        sid = m.speech_id\n        if sid not in self.turn_metrics:\n            self.turn_metrics[sid] = {}\n        self.turn_metrics[sid][\"tts_ttfb\"] = m.ttfb\n        self.turn_metrics[sid][\"tts_duration\"] = m.duration\n\n    def _on_eou_metrics(self, m):\n        sid = m.speech_id\n        if sid not in self.turn_metrics:\n            self.turn_metrics[sid] = {}\n        self.turn_metrics[sid][\"eou_delay\"] = m.end_of_utterance_delay\n        self.turn_metrics[sid][\"transcription_delay\"] = m.transcription_delay
2. Compute End-to-End Latency: Then, you can aggregate the metrics for each turn to find the total latency.
total_latency = eou_delay + transcription_delay + llm_ttft + tts_ttfb
3. Optional Tools: If you're interested, OpenTelemetry can be a great help for distributed tracing, or you might consider forwarding metrics to external systems for visualization. This method helps keep your latency metrics well-organized and easy to access for each agent turn. If you’d like, I’m here to provide more examples or help you through any part of this! Sources: tests/test_agent_session.py | metrics/metrics_vad.py | Logs, metrics, and telemetry | LiveKit Docs
To help you organize latency metrics per agent turn in LiveKit, a great approach is to use the
speech_id
field. This way, you can easily correlate metrics from STT, LLM, TTS, and EOU components. Here’s a simple way to do this: 1. Capture Metrics by Turn: You can set up event listeners to gather metrics from each component and group them by
speech_id
.
class LatencyTrackingAgent:\n    def __init__(self):\n        self.turn_metrics = {}\n\n    def _on_llm_metrics(self, m):\n        sid = m.speech_id\n        if sid not in self.turn_metrics:\n            self.turn_metrics[sid] = {}\n        self.turn_metrics[sid][\"llm_ttft\"] = m.ttft\n        self.turn_metrics[sid][\"llm_duration\"] = m.duration\n\n    def _on_tts_metrics(self, m):\n        sid = m.speech_id\n        if sid not in self.turn_metrics:\n            self.turn_metrics[sid] = {}\n        self.turn_metrics[sid][\"tts_ttfb\"] = m.ttfb\n        self.turn_metrics[sid][\"tts_duration\"] = m.duration\n\n    def _on_eou_metrics(self, m):\n        sid = m.speech_id\n        if sid not in self.turn_metrics:\n            self.turn_metrics[sid] = {}\n        self.turn_metrics[sid][\"eou_delay\"] = m.end_of_utterance_delay\n        self.turn_metrics[sid][\"transcription_delay\"] = m.transcription_delay
2. Compute End-to-End Latency: Once you have the metrics, you can add them up for each turn to get the total latency.
total_latency = eou_delay + transcription_delay + llm_ttft + tts_ttfb
3. Log or Export Metrics: Consider using tools like
UsageCollector
to aggregate data across sessions, or send these metrics to external systems for easier visualization. If you want to dive deeper, enabling OpenTelemetry can help you trace the lifecycle of each turn in detail. Feel free to reach out if you'd like me to walk you through that or provide more examples!
b
Sorry the format you outputted is unreadable can you try to reformat it in a code readable format.
t
Here’s a reformatted example to organize latency metrics per agent turn in a more readable format:
Copy code
class LatencyTrackingAgent:
    def __init__(self):
        self.turn_metrics = {}

    def _on_llm_metrics(self, m):
        sid = m.speech_id
        if sid not in self.turn_metrics:
            self.turn_metrics[sid] = {}
        self.turn_metrics[sid]["llm_ttft"] = m.ttft
        self.turn_metrics[sid]["llm_duration"] = m.duration
        print(f"LLM metrics for turn {sid}: TTFT={m.ttft:.4f}s")

    def _on_tts_metrics(self, m):
        sid = m.speech_id
        if sid not in self.turn_metrics:
            self.turn_metrics[sid] = {}
        self.turn_metrics[sid]["tts_ttfb"] = m.ttfb
        self.turn_metrics[sid]["tts_duration"] = m.duration
        print(f"TTS metrics for turn {sid}: TTFB={m.ttfb:.4f}s")

    def _on_eou_metrics(self, m):
        sid = m.speech_id
        if sid not in self.turn_metrics:
            self.turn_metrics[sid] = {}
        self.turn_metrics[sid]["eou_delay"] = m.end_of_utterance_delay
        self.turn_metrics[sid]["transcription_delay"] = m.transcription_delay
        print(f"EOU metrics for turn {sid}: EOU delay={m.end_of_utterance_delay:.4f}s")

    async def entrypoint(self, ctx):
        session = AgentSession()

        # Set up metric listeners
        session.llm.on("metrics_collected", lambda m: asyncio.create_task(self._on_llm_metrics(m)))
        session.tts.on("metrics_collected", lambda m: asyncio.create_task(self._on_tts_metrics(m)))
        session.stt.on("eou_metrics_collected", lambda m: asyncio.create_task(self._on_eou_metrics(m)))

        await session.start(agent=self, room=ctx.room)
This code organizes metrics by
speech_id
, ensuring that all latency data for a single agent turn is grouped together. If you have any questions or want me to walk you through any part of this, just let me know—I’m here to help!
b
hmm I don't like it, is there not a simpler way? Does the usage collector have aggregated data per agent turn?
t
The
UsageCollector
in LiveKit is thoughtfully designed to gather metrics for the entire session, rather than breaking it down by each agent turn. This means it gives you a helpful cumulative summary at the end, covering important details like LLM tokens, TTS characters, and STT audio durations. If you're looking to get aggregated data for each agent turn, you might want to create some custom logic to handle and store those metrics turn-by-turn. I'd be more than happy to guide you through setting this up—just let me know! Sources: Logs, metrics, and telemetry | LiveKit Docs | examples/voice_agents/basic_agent.py | examples/src/cartersia_tts.ts | Capturing metrics | LiveKit Docs
b
Yeah I'd like just to have a way to aggregate the data so that we have per agent turn latency details. How could we achieve this?
t
To help you gather detailed data on per-agent-turn latency, you might consider creating a custom solution that processes metrics as they come in and organizes them by
speech_id
(which uniquely identifies each turn). Here’s a friendly step-by-step on how you can do this: 1. Initialize a Data Structure: Start by using a dictionary to keep track of metrics for each turn, using
speech_id
as the key.
turn_metrics = {}
2. Capture Metrics: Set up listeners to catch metrics events from STT, LLM, TTS, and EOU components. For every event, store the relevant information in your dictionary.
def on_llm_metrics(m):\n    sid = m.speech_id\n    if sid not in turn_metrics:\n        turn_metrics[sid] = {}\n    turn_metrics[sid]['llm_ttft'] = m.ttft\n    turn_metrics[sid]['llm_duration'] = m.duration
3. Compute Aggregated Latency: Once you have all the metrics for a turn, you can calculate the total latency like this:
total_latency = (turn_metrics[sid]['eou_delay'] +\n                 turn_metrics[sid]['transcription_delay'] +\n                 turn_metrics[sid]['llm_ttft'] +\n                 turn_metrics[sid]['tts_ttfb'])
4. Log or Export Data: After calculating the latency, you can log it or send it off to an external system to visualize the results. This method will give you a clear, detailed view of latency for each agent turn. If you'd like, I’d be happy to provide a full code example or help you further with this—just let me know!
b
The code output is unreadable. Remember to format it correctly. And are these listeners really the best way to do it?
t
Here’s a nicely formatted and easy-to-follow example to help you aggregate per-agent-turn latency metrics. Instead of juggling multiple listeners, this method lets you handle all your metrics in one central spot, making things much cleaner and easier to maintain:
Copy code
class TurnMetricsAggregator:
    def __init__(self):
        self.turn_metrics = {}

    def collect_metrics(self, metric_type, metric_data):
        sid = metric_data.speech_id
        if sid not in self.turn_metrics:
            self.turn_metrics[sid] = {}

        # Store metrics based on their type
        if metric_type == 'llm':
            self.turn_metrics[sid]['llm_ttft'] = metric_data.ttft
            self.turn_metrics[sid]['llm_duration'] = metric_data.duration
        elif metric_type == 'tts':
            self.turn_metrics[sid]['tts_ttfb'] = metric_data.ttfb
            self.turn_metrics[sid]['tts_duration'] = metric_data.duration
        elif metric_type == 'eou':
            self.turn_metrics[sid]['eou_delay'] = metric_data.end_of_utterance_delay
            self.turn_metrics[sid]['transcription_delay'] = metric_data.transcription_delay

    def calculate_total_latency(self, sid):
        metrics = self.turn_metrics.get(sid, {})
        return (
            metrics.get('eou_delay', 0) +
            metrics.get('transcription_delay', 0) +
            metrics.get('llm_ttft', 0) +
            metrics.get('tts_ttfb', 0)
        )

# Example usage
aggregator = TurnMetricsAggregator()

# Simulate collecting metrics
aggregator.collect_metrics('llm', llm_metric_data)
aggregator.collect_metrics('tts', tts_metric_data)
aggregator.collect_metrics('eou', eou_metric_data)

# Calculate total latency for a specific turn
sid = 'example_speech_id'
total_latency = aggregator.calculate_total_latency(sid)
print(f"Total latency for turn {sid}: {total_latency:.4f}s")
This way, you avoid the hassle of multiple listeners and keep all your logic nicely organized in one place. If you have any questions or want to explore this further, feel free to ask—I’m here to help!
b
Okay, this looks better because I noticed that in the metrics that The data is not organized, and we're going to have multiple STTs per LLM. It's kinda like per word instead of per actual transcription. Is there any just parameter we can change to change that instead of doing this really a lot of overhead approach?
t
Thanks for reaching out! I understand you're looking to aggregate STT data per transcription rather than per word. Currently, LiveKit's metrics system doesn’t have a built-in option to do this—it emits STT metrics per request or streaming session, and there isn’t a parameter to adjust that behavior just yet. But don’t worry! If you’d like, I’d be happy to help you create some custom logic to process and group the metrics at the transcription level. Just let me know, and we can work through it together! Sources: Logs, metrics, and telemetry | LiveKit Docs | agents/src/stt/stt.ts | livekit-agents/livekit/agents/stt/stt.py | Capturing metrics | LiveKit Docs
b
hmm ok so there's no way to get what I'm asking for?
t
I understand that you're looking to aggregate STT data per transcription rather than per word, and I’m here to help! Currently, LiveKit doesn’t offer a built-in feature for this, as the metrics system works at the request or session level without an option to change that. But don’t worry—you can definitely create a custom solution to group and process the metrics at the transcription level. This would involve capturing the emitted metrics and organizing them in a way that fits your needs. If you'd like, I’d be happy to walk you through the steps or provide some examples to get you started!
b
Ok sketch the code.
t
Here’s a friendly sketch to help you implement custom logic for aggregating STT data per transcription:
Copy code
class TranscriptionAggregator:
    def __init__(self):
        self.transcriptions = {}

    def collect_stt_metrics(self, stt_data):
        # Group metrics by transcription ID (or another unique identifier)
        transcription_id = stt_data.transcription_id
        if transcription_id not in self.transcriptions:
            self.transcriptions[transcription_id] = {
                'audio_duration': 0,
                'processing_time': 0,
                'word_count': 0
            }

        # Aggregate metrics for the transcription
        self.transcriptions[transcription_id]['audio_duration'] += stt_data.audio_duration
        self.transcriptions[transcription_id]['processing_time'] += stt_data.duration
        self.transcriptions[transcription_id]['word_count'] += len(stt_data.words)

    def get_transcription_summary(self, transcription_id):
        # Return aggregated metrics for a specific transcription
        return self.transcriptions.get(transcription_id, {})

# Example usage
aggregator = TranscriptionAggregator()

# Simulate collecting STT metrics
aggregator.collect_stt_metrics(stt_data_1)
aggregator.collect_stt_metrics(stt_data_2)

# Get summary for a specific transcription
summary = aggregator.get_transcription_summary('example_transcription_id')
print(summary)
This example assumes each STT metric includes a unique
transcription_id
to help group related data together. Feel free to tweak the fields and logic to better fit your specific needs. If you have any questions or want to explore more enhancements, I’m here to help!