This message was deleted.
# ask-for-help
s
This message was deleted.
b
I'm pretty new in BentoML and generally in ML area so sorry if I'm missing something basic.
b
if they are not parallelizable, it would probably make sense to package everything into one runner first.
👀 1
b
Thank you for the tip! You mean encapsulating the feature hydration too into the runner? My thinking is that it will act same as not using the runner but there will be a data exchange between the web service process and the worker (runner). Plus it runs extra processes. I will make a try to measure it, maybe I'm missing something.
b
Ya I'd pack everything in one runner first. The nice thing about runners is that u can have multiple ones
b
I made a try but the performance was not really improved so I kept the solution without workers to keep it more simple. In my understanding the same thing is happening with runners except we have some small overhead because the inter-process communication needs to exchange the unhydrated features. (Please note that we're not batching requests and our "model" is some custom pandas / numpy operation.) BTW I'm using version 1.0.8 Maybe I'm missing something. Are you aware of something with runners which can help performance in this case?
b
When you measuring performance, is it throughput or latency or both?
b
I measured both, I attached some screenshots. Again, maybe I'm missing some basic thing about BentoML but are the runner processes somehow different than the API server processes if we're not micro-batching, neither we use CPU threading optimizations?