I'm pretty new in BentoML and generally in ML area so sorry if I'm missing something basic.
b
Benjamin Tan
12/04/2022, 3:08 AM
if they are not parallelizable, it would probably make sense to package everything into one runner first.
👀 1
b
Bela Bur
12/05/2022, 10:40 AM
Thank you for the tip!
You mean encapsulating the feature hydration too into the runner?
My thinking is that it will act same as not using the runner but there will be a data exchange between the web service process and the worker (runner). Plus it runs extra processes.
I will make a try to measure it, maybe I'm missing something.
b
Benjamin Tan
12/05/2022, 1:28 PM
Ya I'd pack everything in one runner first. The nice thing about runners is that u can have multiple ones
b
Bela Bur
12/06/2022, 12:21 PM
I made a try but the performance was not really improved so I kept the solution without workers to keep it more simple.
In my understanding the same thing is happening with runners except we have some small overhead because the inter-process communication needs to exchange the unhydrated features. (Please note that we're not batching requests and our "model" is some custom pandas / numpy operation.)
BTW I'm using version 1.0.8
Maybe I'm missing something. Are you aware of something with runners which can help performance in this case?
b
Benjamin Tan
12/07/2022, 4:19 AM
When you measuring performance, is it throughput or latency or both?
b
Bela Bur
12/07/2022, 3:52 PM
I measured both, I attached some screenshots. Again, maybe I'm missing some basic thing about BentoML but are the runner processes somehow different than the API server processes if we're not micro-batching, neither we use CPU threading optimizations?