BentoML

There shouldn't be a problem with loading the model outside the runner, but that means that the models will be loaded more times than it needs to be.

I see thanks - in my case the docker image seemed to lead to some kind of memory overflow when loading the models outside the runner, and the system froze. I did put them back into the runner.

I could not yet solve the problem that the container does not do inference, even if the bento does it with no problems. It seems the nvidia drivers are not installed in the container. Is this possible even if it works in the bento ?

Yeah, if you load it outside the runner it will load it many more times than necessary, leading to memory issues if you don't have enough memory.

Are you using the cuda base image?

I specify a cuda_version in the bentofile.yaml, is this sufficient ?  I have created this issue here <https://github.com/bentoml/BentoML/issues/3387>

Oh right! Let me just respond to that really quickly.