BentoML

Actully I get the predict result. But it's still use the CPU.   I want to apply the cuda.

How to save the gpu version whisper model?

image.png

A few things here, First, you are not supposed to call `init_local` during serving time.
`init_local` should only be used for serving and serving only.

If you wish to use the CPU for runner, refer to use the bentoml configuration <https://docs.bentoml.org/en/latest/guides/configuration.html#configuration>

This should also be put under a `@svc.api` decorator. What you are doing is eagerly load the code which is an anti pattern here
```@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text())
def transcribe(input_file):
    ds = run_load_file(url)
    inputs = proccessor_runner.run(
    ds["array"], sampling_rate=16000, return_tensors="pt"
    ).input_features
    predicted_ids = model_runner.generate.run(inputs)
    return proccessor_runner.batch_decode.run(
    predicted_ids, skip_special_tokens=True
    )```

Hey there, do you still run into problems?