This message was deleted.
# ask-for-help
s
This message was deleted.
s
I am having same issue, but could you try to pass
cuda:0
, does it fixes for you??
y
The problem with this is, even though I specified in bentoml config that the model runner(s) will be on different GPUs, so I doing this won’t work. For example, by having runnings to run on GPU
[0, 1]
, but if I do in init to move model to
cuda:0
, then if the requests is going to GPU1, there will be tensors on different devices error.
Copy code
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:6! (when checking argument for argument index in method wrapper__index_select)
So I have to be able to specify this GPU allocation at bentoml server error and I think I’m stuck here. Note, from the error message, the conflict is between cpu and cuda, so it seems like the tokenized inputs isn’t passed to gpu correctly? Update: same error even if I specify to have the runners on the same device (i.e.
[0, 0]
).
s
I mean pipeline should handle both model and inputs to be same device. Alteast for me i could do inference local, but when i build into bento it gives me CUDA error
y
I have same issue with local testing as well (with setting device to cuda), the error is the same: tensors on different devices
s
y
It seems like I’m having different issues than yours. Will probably create another ticket with my issues specifically.
Updates • Seems like I can’t really update the pipeline device using
p.device = xxx
(where
p
is a pipeline object) rather I need to specify that at creation time via
p = pipeline(xxx, device=xxx)
. • This will make inference on single GPU work but when trying to use pipeline in bentoml, it still fails due to GPU allocation. The issue is, I can’t specify
device=torch.device("cuda")
since it will run into:
ValueError: Expected a torch.device with a specified index or an integer, but got:cuda
. The only thing that will work is to specify a specific GPU id (integer). But since I wanted that to be controlled by config file (i.e. in config I can have
[6,6]
that assigns 2 runners of this pipeline to gpu 6), I can’t specify a gpu number there. • So I think there’s some runner initialization issue with transformer pipelines when using GPU.
Update: I was able to make this work without using a custom defined runner class but use
bentoml.transformers.get(xxx).to_runner()
directly (and then
runner.async_run()
in api code). I think there’s some incompatibility between transformers pipeline and bentoml custom runnable class with respect to gpu allocations that maybe the team can look into. However this will make retrieving metadata from the bentoml model unclear. I will do some digging to see if I can retried it from somewhere. One option I found is to, save
my_model = bentoml.transformers.get(xxx)
as a global variable (then do
my_runner = <http://my_model.to|my_model.to>_runner()
, and then in service api section use
my_model.info.metadata
to retrieve that. Is this the proper way of doing this?