Slackbot
10/14/2022, 6:24 PMSean
10/15/2022, 12:46 AMpredict_2()
is expected to be a batchable format as well. Instead of using InputFeatures
, could you please try list[InputFeatures]
? The runner will then batch the lists into a single list.Shihgian Lee
10/15/2022, 3:24 PMlist[InputFeatures]
but my input_data
still came through as a single item and not a list. I have a couple questions for `list[InputFeatures]`:
1. Will input_data
be turned into a list automatically by bentoml?
2. If not, do I need to add []
to domain_request
on the run
method e.g., my_runner_2.predict.run([domain_request])
.
This is the last hurdle I need to overcome before I deploy bentoml 1.0 to our cluster for load testing.Sean
10/16/2022, 1:53 AMinput_data
will not be automatically turned into a list. The typing list[InputFeatures]
is more of a type hint.
2. Yes, you are expected to add []
to the input.Shihgian Lee
10/16/2022, 3:38 AMinput_data
in the service, why do we need the list[InputFeatures]
type hint? That doesn’t seem correct from programming perspective and causes more confusion to the users.
2. I passed [domain_request]
to the run method and the custom runner is happy and returns a list of PredictionResult
data classes to the service. Per our last conversation, I thought the runner will “unbatch” the list and return a single item to the service. But the service gets a list back instead. Please see JSON output below. Did I misunderstand you?
[
{
"rate": 0.169609017,
...
}
]
Sean
10/16/2022, 11:08 PMlist
, ndarray
, dataframe
. Using the following numpy.ndarray
as an example, two requests were sent to the API server.
Request 1:
np.array([[1, 2, 3], [4, 5, 6]])
Request 2:
np.array([[7, 8, 9]])
When these two requests are batched in the runner, what the runner sees is the following.
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Say the runner processes the batch and returns np.array([0, 1, 2])
. The responses to request 1 & 2 looks like the following.
Response 1:
np.array([0,1])
Response 2:
np.array([2])
Sean
10/16/2022, 11:09 PMlist
.Shihgian Lee
10/17/2022, 3:38 PMShihgian Lee
10/17/2022, 3:52 PMline# 8
, PyCharm complains that input_data
is a list but the to_domain_2
accepts a single item because I know we will be dealing with single item in the service (predict_2
). If the List[InputFeatures]
type hinting is not relevant, can we remove it? But, you pointed out that this is how we tell the API server the batching is enabled. This is the contradiction I am trying to reconcile. Can you help? On line# 9
, I pass in a list of domain_request
manually.
2. Since the runner always return a list, do I always return results[0]
(line# 10
) since we know the service always deal with a single item at a time? This is because the upstream service doesn’t expect a list.
My frustration is due to the lacked of request batching explanation on the boundary between API services and runner batching. The batching architecture can be improved by providing code example for different cases.Shihgian Lee
10/17/2022, 9:06 PMChaoyu
10/17/2022, 9:51 PMrunner.predict.run
, the runner instance will receive those run
calls and batch their execution. E.g.
API server process #1 has: runner_a.predict.run([[0,0,0,0]])
API server process #1 has: runner_a.predict.run([[1,1,1,1]])
The runner_a instance will aggregate those calls and execute the predict function with input [[0,0,0,0], [1,1,1,1]]
Let’s say the return from this function call is [0, 1]
, process #1 will receive a return value of 0
and process #2 will receive a return value of 1
Chaoyu
10/17/2022, 9:53 PMbatch_axies=0
, which tells BentoML how to batch multiple input requests and split the responsesShihgian Lee
10/17/2022, 10:01 PMpredict_2
above?
Yes, I am using the default batch_axes=0
to keep thing simple. Currently, I only send 1 request from my local swagger to predict_2
. However, the results
on line# 11
above returns me a list, e.g. [0]
, and NOT 0
. What am I missing?Chaoyu
10/17/2022, 10:03 PM[0]
not 0
Chaoyu
10/17/2022, 10:04 PMChaoyu
10/17/2022, 10:05 PMrunner_a.predict.run([[0,0,0,0]])
API server process #1 has: runner_a.predict.run([[1,1,1,1],[2,2,2,2]])
The runner_a instance will aggregate those calls and execute the predict function with input [[0,0,0,0], [1,1,1,1], [2,2,2,2]]
, returning [0,1,2]
So the return will be [0]
in process #1, and [1,2]
in process #2Shihgian Lee
10/17/2022, 10:06 PMreturn results[0]
(line# 12 above) if the upstream service is expecting a single item returns to them?Chaoyu
10/17/2022, 10:07 PMShihgian Lee
10/17/2022, 10:10 PMyes, the returned value should always have the same length as the input data to the runner, when batching is enabled@Chaoyu got it. so, it is up to the user (me) to extract the single element list to return to the client if we know the client always send one request at a time and expect one item returned to them at a time. am i correct?
Shihgian Lee
10/17/2022, 10:17 PM@svc.api(input=input_spec, output=JSON())
def predict(input_data: List[InputFeatures], ctx: bentoml.Context):
Sean pointed out that I need provide List[InputFeatures]
type hint to enable batching. Do I need to do that if the client always send 1 request at a time?Chaoyu
10/17/2022, 11:40 PMgot it. so, it is up to the user (me) to extract the single element list to return to the client if we know the client always send one request at a time and expect one item returned to them at a time. am i correct?Yes that’s corrected, this is most natural for most ML framework’s inference API as well, e.g. in scikit learn, you can directly map the model inference call to runner.run:
sklearn_model.predict([[1,1,1,1],[2,2,2,2]]) # => [1, 2]
👇
model_runner.predict.run([[1,1,1,1],[2,2,2,2]]) # => [1, 2]
Chaoyu
10/17/2022, 11:41 PMChaoyu
10/17/2022, 11:41 PMInputFeatures
, you don’t need to make that a listShihgian Lee
10/17/2022, 11:42 PMShihgian Lee
10/17/2022, 11:44 PMI think the type hint here is not relevant to batching behavior. if the client is only sending one@Chaoyu I assume you referred to the type hinting. If so, agreed! I will remove the, you don’t need to make that a listInputFeatures
List[]
type hinting and only have InputFeatures
as input to the service.Chaoyu
10/25/2022, 7:13 PM