Is it 2 second average latency? Could you share a bit more about how you set up the tests , and how long does it take your model to run one batch of inference job?
s
Sean
05/17/2023, 8:16 PM
It takes the first few request to train the batching size and window. The adaptive batching algorithm optimizes for average latency of requests. While high percentile latency, especially during initial training, could be higher than
max_latency_ms
, the overall average should be lower.
m
Mikel Menta
05/20/2023, 11:09 AM
Hi I’m sorry about the delay. I have finally have found a bit of time to prepare some code to show what I was saying. And I have also added another question on top. I have asked both questions in a new message in #supporthttps://bentoml.slack.com/archives/CKRANBHPH/p1684580851220889