BentoML

<!channel> :bento: We’re thrilled to announce the release of BentoML `v1.1.0`, our first minor version update since the milestone v1.0.
• *Backward Compatibility:* Rest assured that this release maintains full API backward compatibility with v1.0.
• *Official gRPC Support:* We’ve transitioned <https://docs.bentoml.org/en/latest/guides/grpc.html|gRPC support in BentoML> from experimental to official status, expanding your toolkit for high-performance, low-latency services.
• *Ray Integration:* Ray is a popular open-source compute framework that makes it easy to scale Python workloads. <https://docs.bentoml.org/en/latest/integrations/ray.html|BentoML integrates natively with Ray Serve> to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
• *Enhanced Hugging Face Transformers and Diffusers Support:* All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the <https://docs.bentoml.org/en/latest/frameworks/transformers.html|Transformers> and <https://docs.bentoml.org/en/latest/frameworks/diffusers.html|Diffusers> framework libraries.
• *Enhanced Model Version Management:* Enjoy greater flexibility with the <https://docs.bentoml.org/en/latest/concepts/bento.html#models|improved model version management>, enabling flexible configuration and synchronization of model versions with your remote model store.
:mechanical_arm: We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of <https://ai.meta.com/llama/|Llama 2> models.
• *GPU and CPU Support:* Running Llama is support on both GPU and CPU.
• *Model variations and parameter sizes:* Support all model weights and parameter sizes on Hugging Face. Users can use any weights on HuggingFace (e.g. `TheBloke/Llama-2-13B-chat-GPTQ`), custom weights from local path (e.g. `/path/to/llama-1`), or fine-tuned weights as long as it adheres to <https://huggingface.co/docs/transformers/main/model_doc/llama2#transformers.LlamaForCausalLM|LlamaModelForCausalLM>. Use `openllm models --show-available` to learn more.
• *Stay tuned for Fine-tuning capabilities in OpenLLM:* Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground, `python -m openllm.playground.llama2_qlora --help`.