BentoML

• processing time is the amount of time that the runner takes to actually run the runner code on the inputs it contains
• wait time is the amount of time the runner scheduler will wait before dispatching the next batch
• latency is simply the end-to-end latency of the service.
I think that statement is wrong, in adaptive batching mode we're mostly optimizing for throughput and not latency, though arguably they are linked.

Re: the optimizer's debug logs:
• `o_w` is the average wait time observed for requests that are served (with some decay), this is used to calculate wait time.
• `o_a` and `o_b` are simply the slope and intercept of the regression that we run for processing time