What does an llm ttft metric of -1 mean?
# ask-ai
i
What does an llm ttft metric of -1 mean?
t
Thanks for reaching out, Ed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
In the context of the RealtimeModelMetrics class, the "ttft" metric represents "Time to first audio token in seconds." A value of -1 specifically indicates that "no audio token was sent" during the response generation. Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-openai/livekit/pl...
i
so practically, what does that mean? give me a scenario example
t
Hey there! Let me break down this tricky technical scenario in a way that feels more like we're just chatting. So, imagine you're waiting for a response, but something goes wrong behind the scenes. In this case, with our audio generation system, if no audio tokens ever show up - maybe because of an error or the request got cancelled - the Time to First Token (TTFT) sits at -1. It's basically the system's way of saying, "Oops, nothing happened!" Think of it like ordering a coffee and the barista never starts brewing - the timer just stays at zero. Pretty straightforward, right? 😊 Sources: livekit.agents.llm.llm API documentation | GitHub::livekit::agents | livekit-agents/livekit/agents/llm/llm.py
i
so something prevented the llm generation from working but we dont know what
t
Hey there! I totally get what you're asking about. When you see a TTFT (Time to First Token) metric of -1, it basically means something went wrong before the AI could even start generating audio. It's like when you're trying to make a phone call and the call won't connect - we know something interrupted the process, but we're not quite sure exactly what. This could happen for a bunch of different reasons: maybe there was a hiccup in the request, someone cancelled midway, or there might be a small glitch in the system. Think of it as a "system prevented from starting" signal rather than a detailed explanation.
i
Im seeing these warnings:
Copy code
{
  "message": "inference is slower than realtime",
  "level": "WARNING",
  "name": "livekit.plugins.silero",
  "delay": 0.23960410502180454,
  "pid": 2603,
  "job_id": "AJ_N9ezZkEhdZ2Y",
  "timestamp": "2025-07-09T18:57:35.187367+00:00"
}
and
Copy code
{
  "message": "Error in _bounce_eou_task\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.11/site-packages/livekit/agents/utils/log.py\", line 16, in async_fn_logs\n return await fn(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/livekit/agents/voice/audio_recognition.py\", line 339, in _bounce_eou_task\n end_of_turn_probability = await turn_detector.predict_end_of_turn(chat_ctx)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/livekit/plugins/turn_detector/base.py\", line 220, in predict_end_of_turn\n assert result is not None, \"end_of_utterance prediction should always returns a result\"\n ^^^^^^^^^^^^^^^^^^\nAssertionError: end_of_utterance prediction should always returns a result",
  "level": "ERROR",
  "name": "livekit.agents",
  "pid": 21116,
  "job_id": "AJ_Qpid3jKtrQyF",
  "timestamp": "2025-07-09T19:20:12.360522+00:00"
}
i've been told that both of these might be happening because of not enough CPU or memory. However, my service is deployed in ECS and i dont see any concerning CPU or memory issues on any ECS tasks, would you happen to know what else might have resource constraints?
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!