bento BentoML `v1 0 19` is released with enhanced GPU utili BentoML #announcements

:bento: BentoML `v1.0.19` is released with enhance...

Sean

05/10/2023, 1:28 AM

🍱 BentoML

v1.0.19

is released with enhanced GPU utilization and expanded ML framework support. • Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the

workers_per_resource

scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance.

workers_per_resource

is 1 by default.

Copy code

runners:
  iris:
	  resources:
	    <http://nvidia.com/gpu|nvidia.com/gpu>: 1
	  workers_per_resource: 2

• New ML framework support: We’ve added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks. • Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency. • Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to

v4.18

, ensuring a seamless experience for users with older versions. ⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML’s cloud-native components. Prior, developers were limited to exporting and deploying Bento as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling. 💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML. • Learn more scheduling strategy to get better resource utilization. • Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework. • Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.

🎉 2

Open in Slack

Previous Next