This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

02/06/2023, 4:10 PM

This message was deleted.

Suhas

02/06/2023, 5:50 PM

I am having same issue, but could you try to pass

cuda:0

, does it fixes for you??

Yilun Zhang

02/06/2023, 6:08 PM

The problem with this is, even though I specified in bentoml config that the model runner(s) will be on different GPUs, so I doing this won’t work. For example, by having runnings to run on GPU

[0, 1]

, but if I do in init to move model to

cuda:0

, then if the requests is going to GPU1, there will be tensors on different devices error.

Copy code

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:6! (when checking argument for argument index in method wrapper__index_select)

So I have to be able to specify this GPU allocation at bentoml server error and I think I’m stuck here. Note, from the error message, the conflict is between cpu and cuda, so it seems like the tokenized inputs isn’t passed to gpu correctly? Update: same error even if I specify to have the runners on the same device (i.e.

[0, 0]

Suhas

02/06/2023, 6:58 PM

I mean pipeline should handle both model and inputs to be same device. Alteast for me i could do inference local, but when i build into bento it gives me CUDA error

Yilun Zhang

02/06/2023, 7:02 PM

I have same issue with local testing as well (with setting device to cuda), the error is the same: tensors on different devices

Suhas

02/06/2023, 8:05 PM

I have this ticket https://github.com/bentoml/BentoML/issues/3525

Yilun Zhang

02/06/2023, 8:07 PM

It seems like I’m having different issues than yours. Will probably create another ticket with my issues specifically.

Yilun Zhang

02/06/2023, 9:14 PM

Updates • Seems like I can’t really update the pipeline device using

p.device = xxx

(where

is a pipeline object) rather I need to specify that at creation time via

p = pipeline(xxx, device=xxx)

. • This will make inference on single GPU work but when trying to use pipeline in bentoml, it still fails due to GPU allocation. The issue is, I can’t specify

device=torch.device("cuda")

since it will run into:

ValueError: Expected a torch.device with a specified index or an integer, but got:cuda

. The only thing that will work is to specify a specific GPU id (integer). But since I wanted that to be controlled by config file (i.e. in config I can have

[6,6]

that assigns 2 runners of this pipeline to gpu 6), I can’t specify a gpu number there. • So I think there’s some runner initialization issue with transformer pipelines when using GPU.

Yilun Zhang

02/06/2023, 9:54 PM

Update: I was able to make this work without using a custom defined runner class but use

bentoml.transformers.get(xxx).to_runner()

directly (and then

runner.async_run()

in api code). I think there’s some incompatibility between transformers pipeline and bentoml custom runnable class with respect to gpu allocations that maybe the team can look into. However this will make retrieving metadata from the bentoml model unclear. I will do some digging to see if I can retried it from somewhere. One option I found is to, save

my_model = bentoml.transformers.get(xxx)

as a global variable (then do

my_runner = <http://my_model.to|my_model.to>_runner()

, and then in service api section use

my_model.info.metadata

to retrieve that. Is this the proper way of doing this?

10 Views

Open in Slack

Previous Next