This message was deleted.
# ask-for-help
s
This message was deleted.
👀 1
j
This happened to me when I enabled batching on my service as well, and it was related to the
max_latency_ms
being too short to allow the model to respond. Increasing this to some ridiculous amount got rid of the ServiceUnavailable errors but obviously also increased latency.
🤔 1
I never could get it to work right
😞 1
👀 1
s
What's your max latency set to now? The way that this works is there's a linear model for the time we expect batches to complete in, and if the current time plus the expected time for the batch to complete is larger than the max latency, then those requests are canceled. It probably makes sense at this point to have an option that disables that max latency, but also adding a configuration that limits the amount of time waited between batches---the current issue is that if max latency is set very high, the scheduler will naturally wait a little longer between requests (the scheduler currently waits for the average amount of time requests were in a queue before starting execution of the next batch).
t
We are not currently defining the max latency so it is set to the default value
s
For now I think the only answer we can give is to increase max latency; most of the latency increases from doing that simply arise from not canceling requests that would take too long.
j
Initially I had thought that
max_latency_ms
was simply the window used to batch requests...."I have 100 req/s and a 100ms max latency, my batches will probably be in sizes of 10"
s
We definitely need to document that better, yeah.