Slackbot
06/18/2023, 12:11 PMLucas Wei
06/18/2023, 12:12 PMLucas Wei
06/18/2023, 12:13 PMLucas Wei
06/18/2023, 12:27 PMAaron Pham
06/18/2023, 10:10 PMinit_local
during serving time.
init_local
should only be used for serving and serving only.
If you wish to use the CPU for runner, refer to use the bentoml configuration https://docs.bentoml.org/en/latest/guides/configuration.html#configuration
This should also be put under a @svc.api
decorator. What you are doing is eagerly load the code which is an anti pattern here
@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text())
def transcribe(input_file):
ds = run_load_file(url)
inputs = proccessor_runner.run(
ds["array"], sampling_rate=16000, return_tensors="pt"
).input_features
predicted_ids = model_runner.generate.run(inputs)
return proccessor_runner.batch_decode.run(
predicted_ids, skip_special_tokens=True
)
Aaron Pham
06/22/2023, 8:59 AM