gentle-dream-14103
09/11/2025, 8:41 AMtall-airline-16756
09/11/2025, 8:41 AMtall-airline-16756
09/11/2025, 8:42 AMtotal_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
Let me explain what each part means:
• eou.end_of_utterance_delay: This is the time from when the voice activity stops until the LLM starts processing.
• llm.ttft: The time the LLM takes to generate the very first token of its response.
• tts.ttfb: The time the TTS system takes to produce the first audio byte.
To gather these numbers, you’ll want to subscribe to the metrics_collected
event during your session and capture the relevant metrics (EOUMetrics
, LLMMetrics
, TTSMetrics
). Once you have all these, you can easily compute the total latency.
If you’d like, I’m more than happy to share some code examples or dive deeper into any part of this—just let me know!