https://bentoml.com logo
#support
Title
# support
y

Yakir Saadia

09/22/2022, 8:46 AM
Hello everyone, I am using BentoML v0.13 on 4 GPU based servers with 32GB RAM 8 CPUs and 16GB GPU. I am experiencing slow batch formation (meaning it takes time to start handling the requests) and it doesn't form large batch sizes. I am testing it on a load of 1600 requests, 20 requests per second. I am using BentoML with 6 workers, max batch size of 30, and 1000000 max latency. Did anyone else experience it? Someone has an improvement for it? My servers are underutilized at the moment.
👀 1
🍱 1
b

Bo

09/23/2022, 4:48 PM
hello @Yakir Saadia I suggest you to move to BentoML v1.0 or later version. Is there any constrains that prevent you to do that? feel free to DM me
s

Sean

09/25/2022, 7:51 AM
What batch sizes do you see? Is it capable of serving all requests sent successfully?
y

Yakir Saadia

09/25/2022, 10:03 AM
It serves all the requests sent successfully, but it didn't form a batch larger than 3.
s

Sean

09/25/2022, 10:26 AM
Your requests are most likely very fast and the server is capable of handling request independently without batching. You can try increasing the throughput and try pushing your server to the limit more.
y

Yakir Saadia

09/25/2022, 10:30 AM
But the server takes to long to start handling the request so I don't get the expected request per second I should have
s

Sean

09/26/2022, 12:09 AM
Do you mean the server takes time to warm up? The model may take some time to be loaded into memory. You may want to setup some warm up requests to get the server into a ready state.
y

Yakir Saadia

09/27/2022, 1:29 PM
But it doesn't only happen to me when the server just starts. It happens throughout the the lifetime of the app
b

Bo

09/28/2022, 2:48 AM
@Yakir Saadia can you share more information on this finding?
what’s your configuration for the max latency and max batch size?
@Yakir Saadia actually do you have time for office hour? we can have more productive meeting over zoom
y

Yakir Saadia

09/28/2022, 9:34 PM
@Bo I have mentioned in the main message in this thread, the max batch size is 30 and max latency is 1000000. I would be happy to have a zoom cool on it