I noticed that the transcript for JSON does not have the tim LiveKit Community #ask-ai

I noticed that the transcript for JSON does not ha...

bulky-actor-93512

09/10/2025, 5:15 PM

I noticed that the transcript for JSON does not have the timestamps on when the messages were sent. Is there any way to retrieve this? (edited)

tall-airline-16756

09/10/2025, 5:15 PM

Thanks for reaching out, Sebastian. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/10/2025, 5:16 PM

I'm happy to help you with LiveKit JSON transcripts and their timestamps! Here's how you can find them depending on the type of data: • Chat Messages: The

ChatMessage

object includes a

timestamp

field that tells you exactly when the message was created. This handy feature is available across all SDKs. • Transcriptions: For speech-to-text transcriptions, you’ll find timestamps at both the segment and word levels. Specifically, the

TranscriptionSegment

object has

start_time

and

end_time

fields to show the beginning and end of each transcribed segment. If you'd like, I’m here to guide you through accessing these timestamps in your particular SDK or use case—just let me know! Sources: livekit-protocol/livekit/protocol/models.pyi | packages/livekit-rtc/src/transcription.ts | lib/src/proto/livekit_models.pbjson.dart | livekit-ffi/src/livekit.proto.rs | Pipeline nodes and hooks | LiveKit Docs

bulky-actor-93512

09/10/2025, 6:26 PM

Cool, awesome. Is there a way we can know the latency on each step in these messages? and it being divided by sst, llm, tts etc

tall-airline-16756

09/10/2025, 6:27 PM

LiveKit offers detailed latency metrics for each step in message processing to help you understand the performance better: • Speech-to-Text (STT): You'll find metrics like

duration

(the time it takes to create the transcript) and

agentsSttTtft

(the time until the final transcription is ready). • Large Language Model (LLM): Metrics include

ttft

(time to first token) and

duration

(the total time to generate the full completion). • Text-to-Speech (TTS): Here, you have

ttfb

(time to first byte of audio) and

duration

(how long the TTS request takes to complete). You can log these metrics in real time, aggregate them over a session, or even send them to external monitoring tools. If you'd like a hand with setting this up or accessing these metrics, just let me know—I’m here to help! Sources: Sources/LiveKit/Protos/livekit_metrics.pb.swift | livekit-agents/livekit/agents/metrics/utils.py | agents/src/metrics/utils.ts | Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docs

bulky-actor-93512

09/10/2025, 6:29 PM

Awesome, but could we have this detailed latency metric per transcription message?

tall-airline-16756

09/10/2025, 6:30 PM

Absolutely! LiveKit provides detailed latency metrics for each transcription message, which can be really helpful to understand the performance. These metrics cover: • STT (Speech-to-Text): You'll find

audio_duration

duration

(which tells you the time taken to create the transcript), and whether it was processed in streaming mode. • LLM: This includes

ttft

(time to first token),

duration

(the total time for completion), and token usage metrics. • TTS (Text-to-Speech): Here, you get

ttfb

(time to first byte) and

audioDuration

(duration of the generated audio). These metrics are sent as events during a session, and you can link them to specific transcription messages using fields like

requestId

and

timestamp

bulky-actor-93512

09/10/2025, 7:00 PM

Where do I get that request ID?

tall-airline-16756

09/10/2025, 7:01 PM

I’m happy to help clarify how transcription messages work with LiveKit! For these messages, LiveKit uses a segment ID (

id

) as the unique identifier rather than a request ID. This

id

remains stable and consistent across both interim updates and final results, which makes it easier for you to track and manage transcription segments smoothly. For instance, if you’re working with the Swift SDK, you’ll find the

id

field right in the

TranscriptionSegment

object when handling transcription events. And if you’re using the JavaScript SDK, the

id

is available in the

TranscriptionReceived

event. If you’d like, I’m here to guide you through implementing this in your specific SDK—just let me know! Sources: Sources/LiveKit/Protos/livekit_models.pb.swift | Transcriptions | LiveKit Docs | livekit-agents/livekit/agents/voice/transcription/_utils.py | livekit-protocol/src/livekit.rs | protobufs/livekit_models.proto

bulky-actor-93512

09/10/2025, 7:01 PM

Oh, but I wanna save this latest in the database. I'm asking for this in my Python agent.

tall-airline-16756

09/10/2025, 7:02 PM

I'd be happy to help you save transcription messages in your database using a Python agent! A great way to keep things organized is by using the

segment.id

as the unique identifier for each transcription segment. Here’s a simple approach you can follow: 1. In your Python agent, make sure to enable transcription handling by creating a method that processes each transcription segment. 2. For every

TranscriptionSegment

, gather the

id

text

start_time

end_time

, and

final

fields. 3. Save these details in your database, using the

id

as the primary key. This way, you ensure each entry is unique and can be updated if there are interim results. If you’d like, I can also provide a specific code example or help you with integrating this into your database setup. Just let me know—I’m here to support you!

bulky-actor-93512

09/10/2025, 7:03 PM

Awesome, I already do this. Now, how can I connect the latency metrics to the transcript message or transcript segment?

tall-airline-16756

09/10/2025, 7:04 PM

To help you connect latency metrics to transcript messages in your Python agent, here’s a simple step-by-step guide: 1. Enable TTS-aligned transcripts: Just set

use_tts_aligned_transcript=True

in your

AgentSession

. This will give you access to word-level timestamps through

TimedString

objects. 2. Capture metrics: Use event listeners to gather STT, LLM, and TTS metrics. These include important fields like

speech_id

request_id

, and latency values such as

ttfb

and

duration

. 3. Correlate metrics: Connect these metrics to transcript segments by matching

speech_id

request_id

. For example, you can align

TimedString

timestamps with TTS metrics for detailed analysis. 4. Aggregate data: Use

metrics.UsageCollector

to gather and summarize latency metrics across your sessions. If you’d like, I’d be happy to provide a code example or walk you through the implementation—just let me know! Sources: Capturing metrics | LiveKit Docs | Text and transcriptions | LiveKit Docs | tests/test_agent_session.py

bulky-actor-93512

09/10/2025, 7:06 PM

No, not quite. I mean, I have the transcript segments json here transcript = session.history.to_dict(exclude_timestamp=False) and the collected metrics: @session.on("metrics_collected") def _on_metrics_collected(ev): m = getattr(ev, "metrics", ev) collected_metrics.append(m) Now, how do I connect the latency To their respective transcript segments, or is there any better way of doing this?

tall-airline-16756

09/10/2025, 7:07 PM

I understand you're looking to connect collected metrics to specific transcript segments in LiveKit. While there isn't a direct documented method for this just yet, here's a friendly approach you might find helpful: 1. Track Events: Keep an eye on both

user_input_transcribed

(which gives you the transcript segments) and

metrics_collected

(which provides the metrics). 2. Correlate Identifiers: Try using shared identifiers like

request_id

speech_id

to link the metrics with the transcript segments. Just make sure these IDs stay consistent across the events. 3. Manual Mapping: You can create a mapping within your application logic to tie metrics to transcript segments, either by timing or these identifiers. If you'd like, I’m here to help you implement this or explore other options together. Just let me know! Sources: Logs, metrics, and telemetry | LiveKit Docs | metrics/send-metrics-to-3p/metrics_server/README.md | Text and transcriptions | LiveKit Docs

bulky-actor-93512

09/10/2025, 8:17 PM

Cool. And do you recommend doing this, or is it better just to collect the metrics and display them instead?

tall-airline-16756

09/10/2025, 8:18 PM

Deciding whether to connect metrics to transcript segments or just collect and display them really depends on what you’re aiming to achieve: • Connect Metrics to Segments: If you’re looking for detailed insights into how each transcription segment is performing—maybe for debugging, optimization, or analytics—linking metrics to segments can give you that valuable level of detail. It’s especially helpful when you want to pinpoint specific bottlenecks in your pipeline. • Collect and Display Metrics: On the other hand, if your main goal is to keep an eye on overall system performance or share high-level metrics like average latency or total processing time, collecting and displaying aggregated metrics is a simpler and more efficient approach. If you’re thinking about real-time monitoring or showing metrics to users, aggregated data might be just what you need. But for deeper analysis or troubleshooting, connecting metrics to segments can really make a difference. Feel free to reach out if you’d like me to help you decide what’s best for your situation!

bulky-actor-93512

09/10/2025, 10:38 PM

How do you suggest I calculate latency on each part of the call?

tall-airline-16756

09/10/2025, 10:39 PM

I'd be happy to help you calculate latency for each part of a call (STT, LLM, TTS) in LiveKit! Here are some useful metrics you can use: • STT (Speech-to-Text): Look at

audio_duration

(which shows the duration of the audio input) and

duration

(the time taken for transcription). • LLM (Large Language Model): Check out

ttft

(time to first token) and

duration

(total time for completion). • TTS (Text-to-Speech): Use

ttfb

(time to first byte) and

audio_duration

(duration of the generated audio). • End-of-Utterance (EOU) Delay: Consider

end_of_utterance_delay

(time from speech end to LLM start) and

transcription_delay

(time for final transcript). To find the total conversation latency, you can combine these metrics like this:

Copy code

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb

Don't forget to subscribe to the

metrics_collected

event so you can capture these metrics in real time. If you'd like, I’m here to guide you through the implementation—just let me know! Sources: Capturing metrics | LiveKit Docs | Logs, metrics, and telemetry | LiveKit Docs

bulky-actor-93512

09/10/2025, 10:49 PM

What is the difference between the STT delay and the end of utterance delay?

tall-airline-16756

09/10/2025, 10:50 PM

Thank you for your question! Let me help clarify the difference between STT delay and end-of-utterance (EOU) delay for you: • STT delay (transcription delay): This is the time it takes for the speech-to-text (STT) model to provide the final transcript after you’ve finished speaking. It’s focused just on how long the STT processing takes. • End-of-utterance (EOU) delay: This measures the entire time from when the Voice Activity Detection (VAD) recognizes that you’ve stopped talking to when the agent starts the next step (like LLM inference). It includes both the endpointing process and the transcription time. So, to put it simply, STT delay is part of the EOU delay, which also includes the time needed to make sure you’ve truly finished speaking. Sources: Capturing metrics | LiveKit Docs | Capturing metrics | LiveKit Docs | livekit-agents/livekit/agents/voice/audio_recognition.py

bulky-actor-93512

09/10/2025, 11:11 PM

are llm.duration and tts.duration the whole duration? like literally the WHOLE duration?

tall-airline-16756

09/10/2025, 11:12 PM

I'm glad you asked about the

llm.duration

and

tts.duration

metrics! These help us understand the time taken during different parts of the process: • LLM Duration: This shows how long it took for the LLM to generate the full response, starting from when your prompt was sent to the model until the entire completion was ready. Just a heads-up, this doesn’t include any extra time spent on tool calls that the LLM might trigger. • TTS Duration: This tells us the total time the TTS model needed to create the full audio output from the text you provided. It doesn’t cover the time it takes to actually play back the audio. So, these metrics give us a clear picture of the processing time for each stage, but it’s important to remember they don’t capture the full end-to-end conversation latency. Other things like speech-to-text (STT), end-of-utterance detection, and network delays also play a role. Sources: Logs, metrics, and telemetry | LiveKit Docs | Capturing metrics | LiveKit Docs

bulky-actor-93512

09/11/2025, 12:35 AM

is there any example of latency metrics retrieval?

tall-airline-16756

09/11/2025, 12:36 AM

I'd love to share a helpful example on how you can retrieve and log latency metrics using a LiveKit Python agent. Here it is:

Copy code

from livekit.agents.metrics import LLMMetrics, TTSMetrics, STTMetrics, EOUMetrics

# Handle LLM metrics (e.g., Time to First Token - TTFT)
def on_llm_metrics(metrics: LLMMetrics):
    print(f"LLM TTFT: {metrics.ttft:.4f}s, Duration: {metrics.duration:.4f}s")

# Handle TTS metrics (e.g., Time to First Byte - TTFB)
def on_tts_metrics(metrics: TTSMetrics):
    print(f"TTS TTFB: {metrics.ttfb:.4f}s, Duration: {metrics.duration:.4f}s")

# Handle STT metrics (e.g., transcription duration)
def on_stt_metrics(metrics: STTMetrics):
    print(f"STT Duration: {metrics.duration:.4f}s, Audio Duration: {metrics.audio_duration:.4f}s")

# Handle EOU metrics (e.g., transcription delay)
def on_eou_metrics(metrics: EOUMetrics):
    print(f"Transcription Delay: {metrics.transcription_delay:.4f}s, EOU Delay: {metrics.end_of_utterance_delay:.4f}s")

To capture these metrics in real time, you’ll want to subscribe to the

metrics_collected

event. And if you’re curious about calculating the total conversation latency, you can do it like this:

Copy code

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb

If you’d like any assistance with implementing this or if you want to forward these metrics to an external system, please don’t hesitate to reach out. I’m here to help! Sources: Logs, metrics, and telemetry | LiveKit Docs | metrics/metrics_vad.py

bulky-actor-93512

09/11/2025, 12:37 AM

I'm confused i just want the latency metrics by transcript message, anyway of doing that?

tall-airline-16756

09/11/2025, 12:37 AM

I'm glad you're looking into associating latency metrics with individual transcript messages! A great way to do this is by using the

transcription_node

method in your Python agent. When you enable

use_tts_aligned_transcript=True

in the

AgentSession

, your transcription outputs will include helpful timing information through the

TimedString

object. This gives you

start_time

and

end_time

for each word or sentence, making it easier to align latency metrics with specific parts of your transcript. Here’s a simple example to guide you:

Copy code

async def transcription_node(self, text: AsyncIterable[str | TimedString], model_settings: ModelSettings) -> AsyncGenerator[str | TimedString, None]:
    async for chunk in text:
        if isinstance(chunk, TimedString):
            <http://logger.info|logger.info>(f"TimedString: '{chunk}' ({chunk.start_time} - {chunk.end_time})")
            # Associate this chunk with relevant metrics (e.g., TTS or LLM events)
        yield chunk

You can also combine this approach with metrics like

STTMetrics

LLMMetrics

TTSMetrics

, and

EOUMetrics

to create a detailed timeline of your system’s performance for each transcript segment. If you’d like, I’m here to help you through the next steps or clarify anything else! Sources: metrics/metrics_llm.py | Text and transcriptions | LiveKit Docs | examples/voice_agents/timed_agent_transcript.py

bulky-actor-93512

09/11/2025, 12:38 AM

Awesome, but yeah, I'm talking more about the different metrics for every part of the session: STT, LLM, TTS, EOU. for each transcript message, how can we do that?

tall-airline-16756

09/11/2025, 12:39 AM

To help you associate STT, LLM, TTS, and EOU metrics with each transcript message, here’s a simple step-by-step guide you can follow: 1. Subscribe to Metrics Events: Start by attaching event listeners to capture metrics for each component (STT, LLM, TTS, and EOU). For example:

Copy code

stt.on("metrics_collected", lambda e: self.on_stt_metrics(e))
llm.on("metrics_collected", lambda e: self.on_llm_metrics(e))
tts.on("metrics_collected", lambda e: self.on_tts_metrics(e))
stt.on("eou_metrics_collected", lambda e: self.on_eou_metrics(e))

2. Correlate Metrics: You can use shared identifiers like

speech_id

request_id

to link the metrics back to specific transcript messages. For example:

Copy code

async def on_stt_metrics(self, metrics):
    print(f"Transcript for speech '{metrics.speech_id}': Audio duration = {metrics.audio_duration}s")

3. Aggregate Metrics: The

UsageCollector

class is a great tool to help you track and summarize metrics across the entire session. 4. Calculate Total Latency: Finally, you can combine these metrics to compute the end-to-end latency:

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb

This method will help you connect detailed latency metrics to each transcript message, making performance analysis much clearer. If you have any questions or need a hand with any part of this, just let me know—I’m here to help! Sources: Logs, metrics, and telemetry | LiveKit Docs | Capturing metrics | LiveKit Docs | complex-agents/turn-taking/agent.py | livekit-agents/livekit/agents/metrics/utils.py

tall-airline-16756

09/11/2025, 12:39 AM

To help you associate STT, LLM, TTS, and EOU metrics with each transcript message, here’s a friendly step-by-step guide you can follow: 1. Subscribe to Metrics Events: Start by attaching event listeners to capture metrics for each component (STT, LLM, TTS, EOU). For example:

Copy code

stt.on("metrics_collected", lambda e: self.on_stt_metrics(e))
llm.on("metrics_collected", lambda e: self.on_llm_metrics(e))
tts.on("metrics_collected", lambda e: self.on_tts_metrics(e))
stt.on("eou_metrics_collected", lambda e: self.on_eou_metrics(e))

2. Correlate Metrics: You can link the metrics to specific transcript messages using shared identifiers like

speech_id

(which you'll find in

EOUMetrics

and

STTMetrics

) or by using timestamps. 3. Aggregate Metrics: The

UsageCollector

class is a handy tool to track and summarize these metrics throughout your session. 4. Log or Forward Metrics: Finally, you can use the

log_metrics

helper to either log your metrics or send them to an external system for monitoring. For a quick example, you might calculate the total conversation latency like this:

Copy code

total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb

If you’d like me to walk you through any part of this or provide more code examples, just let me know—I’m here to help!

bulky-actor-93512

09/11/2025, 12:39 AM

but my transcript doesnt have speech_id it has an id

tall-airline-16756

09/11/2025, 12:41 AM

I understand that it can be a bit tricky right now because there isn’t a direct way to link the

id

in transcript segments with the

speech_id

used in metrics like STT, LLM, TTS, and EOU. These IDs are generated separately and aren’t cross-referenced in the system, which can feel a little confusing. But don’t worry! Here are some helpful workarounds you can try: 1. Use Timing Correlation: You can match the timestamps of transcript segments with the metric events (like STT or EOU) that belong to the same session. This often helps align things nicely. 2. Log Both IDs: Another approach is to capture the

speech_id

from your metrics and the

id

from transcript events, then connect them during post-processing. 3. Custom Transcription Forwarding: If you’re comfortable with customization, you might override the default transcription behavior to include the

speech_id

directly in metadata or logs. 4. Use OpenTelemetry: Leveraging tracing attributes like

ATTR_SPEECH_ID

can also help align events throughout the agent lifecycle. If you’d like, I’m here to help you implement any of these options or discuss your specific use case in more detail. Just let me know, and we’ll work through it together! Sources: Logs, metrics, and telemetry | LiveKit Docs | pipeline-stt/diarization.py | livekit-agents/livekit/agents/voice/transcription/__init__.py | agents/src/metrics/base.ts

Open in Slack

Previous Next