Slackbot
02/06/2023, 4:10 PMSuhas
02/06/2023, 5:50 PMcuda:0
, does it fixes for you??Yilun Zhang
02/06/2023, 6:08 PM[0, 1]
, but if I do in init to move model to cuda:0
, then if the requests is going to GPU1, there will be tensors on different devices error.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:6! (when checking argument for argument index in method wrapper__index_select)
So I have to be able to specify this GPU allocation at bentoml server error and I think I’m stuck here.
Note, from the error message, the conflict is between cpu and cuda, so it seems like the tokenized inputs isn’t passed to gpu correctly?
Update: same error even if I specify to have the runners on the same device (i.e. [0, 0]
).Suhas
02/06/2023, 6:58 PMYilun Zhang
02/06/2023, 7:02 PMSuhas
02/06/2023, 8:05 PMYilun Zhang
02/06/2023, 8:07 PMYilun Zhang
02/06/2023, 9:14 PMp.device = xxx
(where p
is a pipeline object) rather I need to specify that at creation time via p = pipeline(xxx, device=xxx)
.
• This will make inference on single GPU work but when trying to use pipeline in bentoml, it still fails due to GPU allocation. The issue is, I can’t specify device=torch.device("cuda")
since it will run into: ValueError: Expected a torch.device with a specified index or an integer, but got:cuda
. The only thing that will work is to specify a specific GPU id (integer). But since I wanted that to be controlled by config file (i.e. in config I can have [6,6]
that assigns 2 runners of this pipeline to gpu 6), I can’t specify a gpu number there.
• So I think there’s some runner initialization issue with transformer pipelines when using GPU.Yilun Zhang
02/06/2023, 9:54 PMbentoml.transformers.get(xxx).to_runner()
directly (and then runner.async_run()
in api code). I think there’s some incompatibility between transformers pipeline and bentoml custom runnable class with respect to gpu allocations that maybe the team can look into.
However this will make retrieving metadata from the bentoml model unclear. I will do some digging to see if I can retried it from somewhere.
One option I found is to, save my_model = bentoml.transformers.get(xxx)
as a global variable (then do my_runner = <http://my_model.to|my_model.to>_runner()
, and then in service api section use my_model.info.metadata
to retrieve that. Is this the proper way of doing this?