BentoML

Also, I have another question. When we use “async_run” instead of “run” in the previous BentoML version, it seems that it processes the single request at a single time. Is it right?

Hi Sangeon, what does "processes the single request at a single time" mean here? Do you mean there's no batching?

<@U02HFFXSZ9D> could you help with the `max_latency` setting? Thanks!

`max_latency_ms` being under batching is misleading as it now affects all (including unbatched) workloads, we're looking to fix that in 1.1.

Yes, we are now not using adaptive batching and if there are multiple request at the same time, the BentoML container returns the message above.

I believe you can just raise `runners.batching.max_latency_ms` and that should solve the problem!