This message was deleted.
# ask-for-help
s
This message was deleted.
👀 1
🍱 1
b
hello @Yakir Saadia I suggest you to move to BentoML v1.0 or later version. Is there any constrains that prevent you to do that? feel free to DM me
s
What batch sizes do you see? Is it capable of serving all requests sent successfully?
y
It serves all the requests sent successfully, but it didn't form a batch larger than 3.
s
Your requests are most likely very fast and the server is capable of handling request independently without batching. You can try increasing the throughput and try pushing your server to the limit more.
y
But the server takes to long to start handling the request so I don't get the expected request per second I should have
s
Do you mean the server takes time to warm up? The model may take some time to be loaded into memory. You may want to setup some warm up requests to get the server into a ready state.
y
But it doesn't only happen to me when the server just starts. It happens throughout the the lifetime of the app
b
@Yakir Saadia can you share more information on this finding?
what’s your configuration for the max latency and max batch size?
@Yakir Saadia actually do you have time for office hour? we can have more productive meeting over zoom
y
@Bo I have mentioned in the main message in this thread, the max batch size is 30 and max latency is 1000000. I would be happy to have a zoom cool on it