brief-vase-33757
07/12/2025, 5:47 AMtall-airline-16756
07/12/2025, 5:47 AMtall-airline-16756
07/12/2025, 5:48 AMsearch_engine' coro
in handler.py
) includes only the time spent on LLM inference—that is, the time it takes the language model to generate its response (from start, through the first token, to completion). It does not include the time it takes to execute any function calls or tool calls that may be triggered by the LLM's response. Each function call's response time is measured and handled separately in the framework.
This means the 2.43 second timing shown refers only to LLM model evaluation and response generation, not the downstream tool execution.
• Source: This information is based on documented LLMMetrics
measurement behavior and implementation in the LiveKit Agents framework.
Let me know if you want details on how tool execution time is tracked or how to measure end-to-end latency.
Sources: Capturing metrics | LiveKit Docs | livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realti...