It seems likely that the 'metrics_collected' event for a multimodal agent, such as the Gemini agent, would include metrics similar to those provided by other agents in LiveKit. These typically include metrics related to speech-to-text (STT), text-to-speech (TTS), and language model (LLM) processing. According to the
LiveKit documentation, common metrics include:
1.
STT Metrics:
- `audio_duration`: Duration of the audio input.
- `duration`: Total time the connection has been open with the STT provider.
2.
LLM Metrics:
- `ttft`: Time to first token.
- `input_tokens`: Number of tokens in the prompt.
- `output_tokens`: Number of tokens generated in the completion.
- `tokens_per_second`: Rate of token generation.
3.
TTS Metrics:
- `ttfb`: Time to first byte.
- `audio_duration`: Duration of the audio output.
These metrics provide insights into the performance and usage of the agent during a session. However, specific differences for the Gemini agent compared to the OpenAI version are not detailed in the available data sources. Would you like more detailed information on any specific metric or functionality?
Sources:
-
LiveKit documentation 👍 0 button 👎 0 button 💡
Hint: Mention @runllm in the thread for followups.