This message was deleted.
# ask-for-help
s
This message was deleted.
c
cc @Jiang @Sean
j
@Yakir Saadia Could you please share the service definition here?
y
What do you mean by service definition?
j
the source code you defined the bentoml Service
y
Do you mean this?
svc = bentoml.Service("AppManager", runners=[r1, r2, r3, r4])
Or the endpoint definition:
Copy code
@svc.api(input=File_IO(mime_type="image/jpeg"), output=JSON_IO())                                                                                                                                                                                                                                                            async def predict_example4(input_file):
j
And also the place you called the runner
Plus the full error log
y
This is where I called the runners:
Copy code
pred1, pred2 = await asyncio.gather(                                                                                                                                                                                                                                                                                             runner1.predict.async_run(img_tensors),                                                                                                                                                                                                                                                                         runner2.predict.async_run(img_tensors)                                                                                                                                                                                                                                                               )
And I don't have at the moment the full error log
@Jiang?
j
How did you serve it
bentoml serve ...
or
bentoml serve --production
?
y
@Jiang with --production
I want to highlight the fact that it happens only at a certain load (The server itself still has plenty of resources). It makes me think that the runner behaves unexpectedly when there is a certain amount of requests handled
j
Did you include runner1 in the
svc = Service(runners=...
y
Yes. All runners are included in the service definition and my runners are usually working. As I highlighted, it happens only at a certain load
j
I believe I need the full log here
https://github.com/bentoml/BentoML/issues/2271 Since you are using async endpoints(which is recommended of course), I believe situation is different from the issue
y
Because of this post I have mentioned that I have aiohttp version 3.8.1
@Jiang I have also encountered this exception: Can you advise?
j
This is basically saying that the connection was already closed for some reason
Did you see any exception from the runner component?
(the
[api_server[
at that start of the log line basically means this exception happens on the api_server component)
y
I didn't see an exception from the runner. What I have shared in the image is the only exception seen
j
I will try to reproduce this issue. Would you be comfortable trying to do the same test in a docker environment?
y
Can't at the moment. In the meantime I will try to get you the full log for the main issue in this thread
@Jiang Here is the traceback to the exception (the main issue in this thread)
@Jiang I think I solved it. From the traceback I was able to figure out that the cause to this problem was that I tried running the inference with "async_run" from an async endpoint