hi eveyone. i have question. when i use batchable...
# ask-for-help
u
hi eveyone. i have question. when i use batchable api, how max_latency_ms work? According to max_latency_ms setting, I get "service busy" error.
Copy code
@bentoml.Runnable.method(batchable=True)
    def infer_batch(self, data):
        data = data.to_dict(orient="records")
        infer_input = [d.get("image_url") for d in data]
        inferred = inference_urls(model, urls)

        return inferred

shot_runner = bentoml.Runner(
    ShotRunnable,
    runnable_init_params={"domain": domain},
    name=f"Runner",
    max_batch_size=32,
    max_latency_ms=10000,
)