Sean
05/10/2023, 1:28 AMv1.0.19
is released with enhanced GPU utilization and expanded ML framework support.
• Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the workers_per_resource
scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance. workers_per_resource
is 1 by default.
runners:
iris:
resources:
<http://nvidia.com/gpu|nvidia.com/gpu>: 1
workers_per_resource: 2
• New ML framework support: We’ve added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks.
• Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency.
• Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to v4.18
, ensuring a seamless experience for users with older versions.
⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML’s cloud-native components. Prior, developers were limited to exporting and deploying Bento as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling.
đź’ˇ With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML.
• Learn more scheduling strategy to get better resource utilization.
• Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework.
• Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.