BentoML

I don’t think there is a generate function for pipeline.

pipeline entrypoint have `__call__`  and `predict`( which is a proxy to `__call__` ) for inference.

I believe the generate function is from the GPT2Model class itself

Previous we do have support for Transformers Model, but it brings a lot of issue and burden to maintain

yes, so for object that has a `__call__`  as entrypoint, the runner will convert it to `.run` and `.async_run`

Okay, I think I’m following. I’ll explore that. Would I need to change anything when I save the model?

no. you can try this
```bento_model = bentoml.transformers.save_model("gpt2-pipeline", pipe)

runner = bentoml.transformers.get(bento_model.tag).to_runner()

runner.init_local()

runner.run("hello world")```

```&gt;&gt;&gt; gpt2.run("hi", max_new_tokens=100)```