BentoML

Could it be:

``` SUPPORTED_RESOURCES = ("<http://nvidia.com/gpu|nvidia.com/gpu>") # &lt;-- this is a string```
Should be:

``` SUPPORTED_RESOURCES = ("<http://nvidia.com/gpu|nvidia.com/gpu>",) # &lt;-- this is a tuple```

tried to make it tuple, it didn’t work though

also i’ve tried to serve the model today natively with fastAPI, and the latency is 50x times smaller while handling 4x times RPS

so if I understand you right, you're saying that GPU runs slower?

no, i’m saying that the CPU gets fully utilized on a very low RPS, even without running an inference.

for some reason, the runners (without any computation in it) drain all the CPUs when I’m getting to around 25-30 RPS

<@U03NQ3FCY6A> - r u using Yatai when running bento on production? I currently containerize the models using bento and then deploy them on k8s without using Yatai.

Have u verified that the GPU is being used?

i’m not using Yatai currently since i’ve seen that it needs S3 as dependency and i don’t work with AWS

i did see that the GPU  is in use, though it looks like it doesn’t constantly gets input to run inference on because on CPU bottleneck

then u can see high spikes on the GPU once in a while