BentoML

Is it 2 second average latency? Could you share a bit more about how you set up the tests , and how long does it take your model to run one batch of inference job?

It takes the first few request to train the batching size and window. The adaptive batching algorithm optimizes for average latency of requests. While high percentile latency, especially during initial training, could be higher than `max_latency_ms`, the overall average should be lower.

Hi I’m sorry about the delay. I have finally have found a bit of time to prepare some code to show what I was saying. And I have also added another question on top. I have asked both questions in a new message in <#CKRANBHPH|support> <https://bentoml.slack.com/archives/CKRANBHPH/p1684580851220889>