bulky-actor-93512
09/10/2025, 5:15 PMtall-airline-16756
09/10/2025, 5:15 PMtall-airline-16756
09/10/2025, 5:16 PMChatMessage
object includes a timestamp
field that tells you exactly when the message was created. This handy feature is available across all SDKs.
• Transcriptions: For speech-to-text transcriptions, you’ll find timestamps at both the segment and word levels. Specifically, the TranscriptionSegment
object has start_time
and end_time
fields to show the beginning and end of each transcribed segment.
If you'd like, I’m here to guide you through accessing these timestamps in your particular SDK or use case—just let me know!
Sources: livekit-protocol/livekit/protocol/models.pyi | packages/livekit-rtc/src/transcription.ts | lib/src/proto/livekit_models.pbjson.dart | livekit-ffi/src/livekit.proto.rs | Pipeline nodes and hooks | LiveKit Docsbulky-actor-93512
09/10/2025, 6:26 PMtall-airline-16756
09/10/2025, 6:27 PMduration
(the time it takes to create the transcript) and agentsSttTtft
(the time until the final transcription is ready).
• Large Language Model (LLM): Metrics include ttft
(time to first token) and duration
(the total time to generate the full completion).
• Text-to-Speech (TTS): Here, you have ttfb
(time to first byte of audio) and duration
(how long the TTS request takes to complete).
You can log these metrics in real time, aggregate them over a session, or even send them to external monitoring tools. If you'd like a hand with setting this up or accessing these metrics, just let me know—I’m here to help!
Sources: Sources/LiveKit/Protos/livekit_metrics.pb.swift | livekit-agents/livekit/agents/metrics/utils.py | agents/src/metrics/utils.ts | Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docsbulky-actor-93512
09/10/2025, 6:29 PMtall-airline-16756
09/10/2025, 6:30 PMaudio_duration
, duration
(which tells you the time taken to create the transcript), and whether it was processed in streaming mode.
• LLM: This includes ttft
(time to first token), duration
(the total time for completion), and token usage metrics.
• TTS (Text-to-Speech): Here, you get ttfb
(time to first byte) and audioDuration
(duration of the generated audio).
These metrics are sent as events during a session, and you can link them to specific transcription messages using fields like requestId
and timestamp
. If you'd like, I’m here to help you set this up or answer any questions you might have!
Sources: Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/metrics/utils.py | Logs, metrics, and telemetry | LiveKit Docs | agents/src/metrics/utils.ts | metrics/metrics_tts.pybulky-actor-93512
09/10/2025, 7:00 PMtall-airline-16756
09/10/2025, 7:01 PMid
) as the unique identifier rather than a request ID. This id
remains stable and consistent across both interim updates and final results, which makes it easier for you to track and manage transcription segments smoothly.
For instance, if you’re working with the Swift SDK, you’ll find the id
field right in the TranscriptionSegment
object when handling transcription events. And if you’re using the JavaScript SDK, the id
is available in the TranscriptionReceived
event.
If you’d like, I’m here to guide you through implementing this in your specific SDK—just let me know!
Sources: Sources/LiveKit/Protos/livekit_models.pb.swift | Transcriptions | LiveKit Docs | livekit-agents/livekit/agents/voice/transcription/_utils.py | livekit-protocol/src/livekit.rs | protobufs/livekit_models.protobulky-actor-93512
09/10/2025, 7:01 PMtall-airline-16756
09/10/2025, 7:02 PMsegment.id
as the unique identifier for each transcription segment. Here’s a simple approach you can follow:
1. In your Python agent, make sure to enable transcription handling by creating a method that processes each transcription segment.
2. For every TranscriptionSegment
, gather the id
, text
, start_time
, end_time
, and final
fields.
3. Save these details in your database, using the id
as the primary key. This way, you ensure each entry is unique and can be updated if there are interim results.
If you’d like, I can also provide a specific code example or help you with integrating this into your database setup. Just let me know—I’m here to support you!bulky-actor-93512
09/10/2025, 7:03 PMtall-airline-16756
09/10/2025, 7:04 PMuse_tts_aligned_transcript=True
in your AgentSession
. This will give you access to word-level timestamps through TimedString
objects.
2. Capture metrics: Use event listeners to gather STT, LLM, and TTS metrics. These include important fields like speech_id
, request_id
, and latency values such as ttfb
and duration
.
3. Correlate metrics: Connect these metrics to transcript segments by matching speech_id
or request_id
. For example, you can align TimedString
timestamps with TTS metrics for detailed analysis.
4. Aggregate data: Use metrics.UsageCollector
to gather and summarize latency metrics across your sessions.
If you’d like, I’d be happy to provide a code example or walk you through the implementation—just let me know!
Sources: Capturing metrics | LiveKit Docs | Text and transcriptions | LiveKit Docs | tests/test_agent_session.pybulky-actor-93512
09/10/2025, 7:06 PMtall-airline-16756
09/10/2025, 7:07 PMuser_input_transcribed
(which gives you the transcript segments) and metrics_collected
(which provides the metrics).
2. Correlate Identifiers: Try using shared identifiers like request_id
or speech_id
to link the metrics with the transcript segments. Just make sure these IDs stay consistent across the events.
3. Manual Mapping: You can create a mapping within your application logic to tie metrics to transcript segments, either by timing or these identifiers.
If you'd like, I’m here to help you implement this or explore other options together. Just let me know!
Sources: Logs, metrics, and telemetry | LiveKit Docs | metrics/send-metrics-to-3p/metrics_server/README.md | Text and transcriptions | LiveKit Docsbulky-actor-93512
09/10/2025, 8:17 PMtall-airline-16756
09/10/2025, 8:18 PMbulky-actor-93512
09/10/2025, 10:38 PMtall-airline-16756
09/10/2025, 10:39 PMaudio_duration
(which shows the duration of the audio input) and duration
(the time taken for transcription).
• LLM (Large Language Model): Check out ttft
(time to first token) and duration
(total time for completion).
• TTS (Text-to-Speech): Use ttfb
(time to first byte) and audio_duration
(duration of the generated audio).
• End-of-Utterance (EOU) Delay: Consider end_of_utterance_delay
(time from speech end to LLM start) and transcription_delay
(time for final transcript).
To find the total conversation latency, you can combine these metrics like this:
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
Don't forget to subscribe to the metrics_collected
event so you can capture these metrics in real time. If you'd like, I’m here to guide you through the implementation—just let me know!
Sources: Capturing metrics | LiveKit Docs | Logs, metrics, and telemetry | LiveKit Docsbulky-actor-93512
09/10/2025, 10:49 PMtall-airline-16756
09/10/2025, 10:50 PMbulky-actor-93512
09/10/2025, 11:11 PMtall-airline-16756
09/10/2025, 11:12 PMllm.duration
and tts.duration
metrics! These help us understand the time taken during different parts of the process:
• LLM Duration: This shows how long it took for the LLM to generate the full response, starting from when your prompt was sent to the model until the entire completion was ready. Just a heads-up, this doesn’t include any extra time spent on tool calls that the LLM might trigger.
• TTS Duration: This tells us the total time the TTS model needed to create the full audio output from the text you provided. It doesn’t cover the time it takes to actually play back the audio.
So, these metrics give us a clear picture of the processing time for each stage, but it’s important to remember they don’t capture the full end-to-end conversation latency. Other things like speech-to-text (STT), end-of-utterance detection, and network delays also play a role.
Sources: Logs, metrics, and telemetry | LiveKit Docs | Capturing metrics | LiveKit Docsbulky-actor-93512
09/11/2025, 12:35 AMtall-airline-16756
09/11/2025, 12:36 AMfrom livekit.agents.metrics import LLMMetrics, TTSMetrics, STTMetrics, EOUMetrics
# Handle LLM metrics (e.g., Time to First Token - TTFT)
def on_llm_metrics(metrics: LLMMetrics):
print(f"LLM TTFT: {metrics.ttft:.4f}s, Duration: {metrics.duration:.4f}s")
# Handle TTS metrics (e.g., Time to First Byte - TTFB)
def on_tts_metrics(metrics: TTSMetrics):
print(f"TTS TTFB: {metrics.ttfb:.4f}s, Duration: {metrics.duration:.4f}s")
# Handle STT metrics (e.g., transcription duration)
def on_stt_metrics(metrics: STTMetrics):
print(f"STT Duration: {metrics.duration:.4f}s, Audio Duration: {metrics.audio_duration:.4f}s")
# Handle EOU metrics (e.g., transcription delay)
def on_eou_metrics(metrics: EOUMetrics):
print(f"Transcription Delay: {metrics.transcription_delay:.4f}s, EOU Delay: {metrics.end_of_utterance_delay:.4f}s")
To capture these metrics in real time, you’ll want to subscribe to the metrics_collected
event. And if you’re curious about calculating the total conversation latency, you can do it like this:
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
If you’d like any assistance with implementing this or if you want to forward these metrics to an external system, please don’t hesitate to reach out. I’m here to help!
Sources: Logs, metrics, and telemetry | LiveKit Docs | metrics/metrics_vad.pybulky-actor-93512
09/11/2025, 12:37 AMtall-airline-16756
09/11/2025, 12:37 AMtranscription_node
method in your Python agent. When you enable use_tts_aligned_transcript=True
in the AgentSession
, your transcription outputs will include helpful timing information through the TimedString
object. This gives you start_time
and end_time
for each word or sentence, making it easier to align latency metrics with specific parts of your transcript.
Here’s a simple example to guide you:
async def transcription_node(self, text: AsyncIterable[str | TimedString], model_settings: ModelSettings) -> AsyncGenerator[str | TimedString, None]:
async for chunk in text:
if isinstance(chunk, TimedString):
<http://logger.info|logger.info>(f"TimedString: '{chunk}' ({chunk.start_time} - {chunk.end_time})")
# Associate this chunk with relevant metrics (e.g., TTS or LLM events)
yield chunk
You can also combine this approach with metrics like STTMetrics
, LLMMetrics
, TTSMetrics
, and EOUMetrics
to create a detailed timeline of your system’s performance for each transcript segment. If you’d like, I’m here to help you through the next steps or clarify anything else!
Sources: metrics/metrics_llm.py | Text and transcriptions | LiveKit Docs | examples/voice_agents/timed_agent_transcript.pybulky-actor-93512
09/11/2025, 12:38 AMtall-airline-16756
09/11/2025, 12:39 AMstt.on("metrics_collected", lambda e: self.on_stt_metrics(e))
llm.on("metrics_collected", lambda e: self.on_llm_metrics(e))
tts.on("metrics_collected", lambda e: self.on_tts_metrics(e))
stt.on("eou_metrics_collected", lambda e: self.on_eou_metrics(e))
2. Correlate Metrics: You can use shared identifiers like speech_id
or request_id
to link the metrics back to specific transcript messages. For example:
async def on_stt_metrics(self, metrics):
print(f"Transcript for speech '{metrics.speech_id}': Audio duration = {metrics.audio_duration}s")
3. Aggregate Metrics: The UsageCollector
class is a great tool to help you track and summarize metrics across the entire session.
4. Calculate Total Latency: Finally, you can combine these metrics to compute the end-to-end latency:
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
This method will help you connect detailed latency metrics to each transcript message, making performance analysis much clearer. If you have any questions or need a hand with any part of this, just let me know—I’m here to help!
Sources: Logs, metrics, and telemetry | LiveKit Docs | Capturing metrics | LiveKit Docs | complex-agents/turn-taking/agent.py | livekit-agents/livekit/agents/metrics/utils.pytall-airline-16756
09/11/2025, 12:39 AMstt.on("metrics_collected", lambda e: self.on_stt_metrics(e))
llm.on("metrics_collected", lambda e: self.on_llm_metrics(e))
tts.on("metrics_collected", lambda e: self.on_tts_metrics(e))
stt.on("eou_metrics_collected", lambda e: self.on_eou_metrics(e))
2. Correlate Metrics: You can link the metrics to specific transcript messages using shared identifiers like speech_id
(which you'll find in EOUMetrics
and STTMetrics
) or by using timestamps.
3. Aggregate Metrics: The UsageCollector
class is a handy tool to track and summarize these metrics throughout your session.
4. Log or Forward Metrics: Finally, you can use the log_metrics
helper to either log your metrics or send them to an external system for monitoring.
For a quick example, you might calculate the total conversation latency like this:
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
If you’d like me to walk you through any part of this or provide more code examples, just let me know—I’m here to help!bulky-actor-93512
09/11/2025, 12:39 AMtall-airline-16756
09/11/2025, 12:41 AMid
in transcript segments with the speech_id
used in metrics like STT, LLM, TTS, and EOU. These IDs are generated separately and aren’t cross-referenced in the system, which can feel a little confusing.
But don’t worry! Here are some helpful workarounds you can try:
1. Use Timing Correlation: You can match the timestamps of transcript segments with the metric events (like STT or EOU) that belong to the same session. This often helps align things nicely.
2. Log Both IDs: Another approach is to capture the speech_id
from your metrics and the id
from transcript events, then connect them during post-processing.
3. Custom Transcription Forwarding: If you’re comfortable with customization, you might override the default transcription behavior to include the speech_id
directly in metadata or logs.
4. Use OpenTelemetry: Leveraging tracing attributes like ATTR_SPEECH_ID
can also help align events throughout the agent lifecycle.
If you’d like, I’m here to help you implement any of these options or discuss your specific use case in more detail. Just let me know, and we’ll work through it together!
Sources: Logs, metrics, and telemetry | LiveKit Docs | pipeline-stt/diarization.py | livekit-agents/livekit/agents/voice/transcription/__init__.py | agents/src/metrics/base.ts