Is it possible to add a timeout to LLM calls so th...
# ask-ai
c
Is it possible to add a timeout to LLM calls so they get re-run if they go over a certain limit? E.g., p99 latency is <5 seconds TTFT. I'd expect anything above 10 seconds to be a fatal error and I'd like to timeout then and retry the call.
t
Thanks for reaching out, Jake. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Absolutely! You can add timeouts to your LLM calls and set up retries if they take too long. This is all handled by the
FallbackAdapter
class, which lets you easily customize timeout and retry settings to fit your needs. Here’s what you can do:Timeout Configuration: Use the
attempt_timeout
parameter to set how long each LLM attempt should wait before timing out. • Retry Configuration: You can tweak retries with options like: ◦ `max_retry_per_llm`: How many times to retry per LLM. ◦ `retry_interval`: How long to wait between retry attempts. ◦ `retry_on_chunk_sent`: Whether to retry even after some chunks have been sent. Here’s a quick example to get you started:
Copy code
fallback_llm = llm.FallbackAdapter(
    llm=[OpenAILLM(model="gpt-4")],
    attempt_timeout=5.0,
    max_retry_per_llm=1,
    retry_interval=0.5,
    retry_on_chunk_sent=False
)
Plus, the
FallbackAdapter
comes with handy built-in retry logic using exponential backoff for errors that can be recovered from. It can also switch to other LLMs in your list if the main one runs into trouble or times out. Pretty neat, right? Sources: livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py | LLM Integrations | livekit/livekit_composite | DeepWiki | Agents Core Architecture | livekit/livekit_composite | DeepWiki