• processing time is the amount of time that the runner takes to actually run the runner code on the inputs it contains
• wait time is the amount of time the runner scheduler will wait before dispatching the next batch
• latency is simply the end-to-end latency of the service.
I think that statement is wrong, in adaptive batching mode we're mostly optimizing for throughput and not latency, though arguably they are linked.
Re: the optimizer's debug logs:
•
o_w
is the average wait time observed for requests that are served (with some decay), this is used to calculate wait time.
•
o_a
and
o_b
are simply the slope and intercept of the regression that we run for processing time