Sean
07/24/2023, 9:14 PMv1.1.0
, our first minor version update since the milestone v1.0.
• Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0.
• Official gRPC Support: We’ve transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services.
• Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
• Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries.
• Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store.
🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models.
• GPU and CPU Support: Running Llama is support on both GPU and CPU.
• Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face. Users can use any weights on HuggingFace (e.g. TheBloke/Llama-2-13B-chat-GPTQ
), custom weights from local path (e.g. /path/to/llama-1
), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM. Use openllm models --show-available
to learn more.
• Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground, python -m openllm.playground.llama2_qlora --help
.