BentoML

U can pass blfloat16 into the model and save it with it

So can I understand it as, if the model in memory is in `bfloat16`, then the saved model will also be in the same format? Maybe it’s also on transformers side that, even though `config.json` specifies the model to be `bfloat16`, the loaded model is still `float32`. I will look further into this.

<@U01BMCJ29LN> is that correct? I read your message to mean you need to pass an option into `save_model`, but I'm not exactly sure.

I was able to resolve it by simply passing `torch_dtype=torch.bfloat16` while loading the model in memory. The saved model is now only 13GB :smile:

the loaded you can pass into via `kwargs` for `load_model`