This message was deleted.
# ask-for-help
s
This message was deleted.
l
Actully I get the predict result. But it's still use the CPU. I want to apply the cuda.
How to save the gpu version whisper model?
update model info.
a
A few things here, First, you are not supposed to call
init_local
during serving time.
init_local
should only be used for serving and serving only. If you wish to use the CPU for runner, refer to use the bentoml configuration https://docs.bentoml.org/en/latest/guides/configuration.html#configuration This should also be put under a
@svc.api
decorator. What you are doing is eagerly load the code which is an anti pattern here
Copy code
@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text())
def transcribe(input_file):
    ds = run_load_file(url)
    inputs = proccessor_runner.run(
    ds["array"], sampling_rate=16000, return_tensors="pt"
    ).input_features
    predicted_ids = model_runner.generate.run(inputs)
    return proccessor_runner.batch_decode.run(
    predicted_ids, skip_special_tokens=True
    )
Hey there, do you still run into problems?