This message was deleted.
# ask-for-help
s
This message was deleted.
m
My favoured approach is to create a
custom runner
with 🤗 optimum onnx runtime. Or are there better alternatives?
s
Using a custom runner is currently the best approach to using 🤗 optimum. We may add deeper integration through
bentoml.optimum
in the future.
🙏 1
a
Hi there, is there any errors that you are currently running into?
m
Thanks for the input! A custom runner does work flawlessly. On the error: Even without it, I was able to import the ONNX model into the bento model store as a
bentoml.transformers
. So far so good. Everything gets saved properly in the folder. It only fails on run time when trying to load the model. This is expected, since 🤗 optimum has its own classes for handling the models. E.g.
ORTModelForCausalLM
instead of
AutoModelForCausalLM
. That's why the error trace below occurs. It should be enough to add
from optimum.pipelines import ORT_SUPPORTED_TASKS
to the already present
SUPPORTED_TASKS
from the regular pipeline and load optimum in bento's transformers.py load_model?! I would be happy to supply a PR for that 🥳.
Copy code
2023-03-17T10:12:10+0100 [ERROR] [dev_api_server:completion] Traceback (most recent call last):
  File "/Users/malte/miniconda3/envs/bento/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 290, in init_local
    self._init(LocalRunnerRef)
  File "/Users/malte/miniconda3/envs/bento/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 137, in _init
    object_setattr(self, "_runner_handle", handle_class(self))
  File "/Users/malte/miniconda3/envs/bento/lib/python3.10/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 24, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
  File "/Users/malte/miniconda3/envs/bento/lib/python3.10/site-packages/bentoml/_internal/frameworks/transformers.py", line 474, in __init__
    self.pipeline = load_model(bento_model, **kwargs)
  File "/Users/malte/miniconda3/envs/bento/lib/python3.10/site-packages/bentoml/_internal/frameworks/transformers.py", line 235, in load_model
    return transformers.pipeline(task=task, model=bento_model.path, **extra_kwargs)
  File "/Users/malte/miniconda3/envs/bento/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 776, in pipeline
    framework, model = infer_framework_load_model(
  File "/Users/malte/miniconda3/envs/bento/lib/python3.10/site-packages/transformers/pipelines/base.py", line 271, in infer_framework_load_model
    raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
ValueError: Could not load model /Users/malte/bentoml/bentos/completion/iml4dpweuo6u6mto/models/text-generation-pipeline-gpt2/yxjg2owea6qeimto with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>).
I think that the ability to use 🤗 ONNX implementation with optimum would be a very valuable addition as it significantly reduces the complexity to use production-ready models. And this IMHO fits the 🍱 vision perfectly.
a
Hi @Malte https://github.com/bentoml/BentoML/pull/3684 will add supports for saving any arbitrary transformers model that follow their spec (i.e: implementing a
save_pretrained
and
from_pretrained
) so loading custom model should now work. Haven’t had much thoughts for this, but I think for optimum what we can do is to have a
bentoml.optimum
that can basically use the same logic from
transformers,
similar to
bentoml.diffusers
❤️ 1
m
Thanks a lot! I will have a look 🙂. And yes this should work as it is in principle just another import but
optimum
does follow the 🤗 API closely.