This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

02/28/2023, 10:20 PM

This message was deleted.

👀 1

Jim Rohrer

02/28/2023, 10:37 PM

This happened to me when I enabled batching on my service as well, and it was related to the

max_latency_ms

being too short to allow the model to respond. Increasing this to some ridiculous amount got rid of the ServiceUnavailable errors but obviously also increased latency.

🤔 1

Jim Rohrer

02/28/2023, 10:37 PM

I never could get it to work right

😞 1

👀 1

sauyon

03/01/2023, 7:25 PM

What's your max latency set to now? The way that this works is there's a linear model for the time we expect batches to complete in, and if the current time plus the expected time for the batch to complete is larger than the max latency, then those requests are canceled. It probably makes sense at this point to have an option that disables that max latency, but also adding a configuration that limits the amount of time waited between batches---the current issue is that if max latency is set very high, the scheduler will naturally wait a little longer between requests (the scheduler currently waits for the average amount of time requests were in a queue before starting execution of the next batch).

Thomas Busath

03/01/2023, 8:02 PM

We are not currently defining the max latency so it is set to the default value

sauyon

03/01/2023, 8:10 PM

For now I think the only answer we can give is to increase max latency; most of the latency increases from doing that simply arise from not canceling requests that would take too long.

Jim Rohrer

03/01/2023, 9:57 PM

Initially I had thought that

max_latency_ms

was simply the window used to batch requests...."I have 100 req/s and a 100ms max latency, my batches will probably be in sizes of 10"

sauyon

03/01/2023, 9:58 PM

We definitely need to document that better, yeah.

8 Views

Open in Slack

Previous Next