:bento: BentoML `v1.0.19` is released with enhance...
# announcements
s
🍱 BentoML
v1.0.19
is released with enhanced GPU utilization and expanded ML framework support. • Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the
workers_per_resource
scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance.
workers_per_resource
is 1 by default.
Copy code
runners:
  iris:
	  resources:
	    <http://nvidia.com/gpu|nvidia.com/gpu>: 1
	  workers_per_resource: 2
• New ML framework support: We’ve added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks. • Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency. • Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to
v4.18
, ensuring a seamless experience for users with older versions. ⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML’s cloud-native components. Prior, developers were limited to exporting and deploying Bento as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling. 💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML. • Learn more scheduling strategy to get better resource utilization. • Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework. • Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.
🎉 2