This message was deleted.
# ask-for-help
s
This message was deleted.
a
You can create a setup script
download_weights.py
similar to https://github.com/bentoml/BentoML/blob/main/examples/custom_runner/nltk_pretrained_model/download_nltk_models.py and you can then pass it into
docker.setup_script=/path/to/download_weights.py
to include the cache folder into the container.
wrt the float16 support, cc @larme (shenyang) for more information but afaik https://github.com/bentoml/BentoML/pull/3823 is an extension work on supporting diffusers rapid change.
s
Thanks! would it make sense to specify a model path inside a bento.yaml file to include it automatically ? Don't the models need to be in the "bento format" or can they be used just as they are loaded into the HuggingFace library ? I could go the classical route of just specifying a docker file and load everything manually, but I wonder if it's more efficient doing it the bento way
a
For huggingface we let them to manage their own cache, as it doesn’t make sense to save the hugging face into bentoml store as it will save twice. A Bento can also exclude the model.
s
I see this would be true also for transformer LLMs ? I want to deploy a solution that already has the model files (Stable Diffusion) inside the docker container or some persistent docker volume. So I would not even use the cache directory of HuggingFace but local model files that could be several GB in size each. Is there a way to support persistent docker volumes ?