<!channel> :bento: We’re thrilled to announce the ...
# announcements
s
<!channel> 🍱 We’re thrilled to announce the release of BentoML
v1.1.0
, our first minor version update since the milestone v1.0. • Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0. • Official gRPC Support: We’ve transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services. • Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration. • Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries. • Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store. 🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models. • GPU and CPU Support: Running Llama is support on both GPU and CPU. • Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face. Users can use any weights on HuggingFace (e.g.
TheBloke/Llama-2-13B-chat-GPTQ
), custom weights from local path (e.g.
/path/to/llama-1
), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM. Use
openllm models --show-available
to learn more. • Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground,
python -m openllm.playground.llama2_qlora --help
.
🙏 4
🚀 3
🎉 3
👀 3
🍱 7
👏 14
party parrot 14