BentoML

Hm, I'm not sure, do you have batching enabled?

I am using bentoml default configuration, so batching is enabled by default

Maybe I try disable batching entirely and see whether it helps

Batching shouldn't be enabled by default anymore; it's only enabled if you explicitly enable it on the model.

(it's enabled in the configuration, but that shouldn't apply to the model unless you saved the model with `batchable=True`)

image.png

Oh, I did save the model with `batchable=True` as I just copy the model saving code from the documentation.

If this is the case do I need to re-save the model without the line or just disable it from the configuration yaml will do?

For testing disabling batching in the server should be fine.

The inference time has stabilized after disabling batching. Thanks for the pointer!

If want to deploy to production, should save the model without the batchable signature?

Awesome! You can always disable batching on a per-deployment basis or lower the `max-latency` option, but if you want to disable it globally without having to think about it setting `batchable=False` is probably the easiest option.