BentoML #announcements

11/02/2022, 9:38 PM

Hello folks We’re excited to have Hamza Tahir chat with our community Tuesday, Nov 8th, 9-10am PST/12-1pm EST for our AMA session on all things MLOps. Hamza is the co-founder of ZenML. ZenML is an open-source MLOps framework to create reproducible Machine Learning pipelines. A reoccurring struggle in machine learning is the large gap between the training part of the ML development and the post-training/deployment phase. ZenML aims to bridge that gap, by building a simple, open-source, pipeline framework aimed toward data scientists to create ML workflows that can be taken to production with minimum effort. For this session, please join us and ask questions about: • MLOps orchestration • The emerging MLOps toolchain • What’s the missing gap in MLOps If you can’t attend this live AMA, you can leave your questions in this slack thread and I will post them for you.

👀 1

🔥 1

👍 1

🚀 1

🎉 1

💯 1

Slackbot

11/04/2022, 6:15 PM

This message was deleted.

👏 5

🙌 6

🚀 12

Slackbot

11/09/2022, 2:39 AM

This message was deleted.

💯 1

🍱 3

👍 6

🚀 4

❤️ 3

🔥 1

11/11/2022, 7:38 PM

We’re excited to chat with Alessya Visnjicon on November 17th, 1-2 pm PST/4-5 pm EST for our AMA session on all things MLOps. Alessya Visnjic is the CEO and co-founder of WhyLabs, the AI Observability company on a mission to build the interface between AI and human operators. Prior to WhyLabs, Alessya was a CTO-in-residence at the Allen Institute for AI (AI2), where she evaluated the commercial potential for the latest advancements in AI research. Earlier in her career, Alessya spent 9 years at Amazon leading Machine Learning adoption and tooling efforts. She was a founding member of Amazon’s first ML research center in Berlin, Germany. Alessya is also the founder of Rsqrd AI, a global community of 1,000+ AI practitioners who are committed to making AI technology Robust & Responsible. For this session, join us to ask questions about: • The emerging MLOps toolchain • Productionalizing your model • Troubleshooting model issues in real-time • Improving your overall model performance • OpenSource approach for MLOps If you can’t attend this live AMA, you can leave your questions in this slack thread and I will post them for you.

👍 10

🚀 8

🔥 7

❤️ 5

11/28/2022, 7:48 PM

We’re excited to chat with Kevin Kho on December 8th, 1-2 pm PST/4-5 pm EST, for our AMA session on all things MLOps. Kevin Kho is a maintainer for the Fugue project, an abstraction layer for distributed computing. Previously, he was an Open Source Community Engineer at Prefect, a workflow orchestration management system. Before working on data tooling, he was a data scientist for 4 years. For this session, join us to ask questions about the following: • The emerging MLOps toolchain • Distributed computing and ML • Present and future of distributed computing. If you can’t attend this live AMA, you can leave your questions in this slack thread, and I will post them for you.

👏 1

👍 2

🔥 5

Sean

12/07/2022, 8:37 PM

🍱 <!channel> BentoML

v1.0.11

is here featuring the introduction of an inference collection and model monitoring API that can be easily integrated with any model monitoring frameworks. • Introduced the

bentoml.monitor

API for monitoring any features, predictions, and target data in numerical, categorical, and numerical sequence types.

Copy code

import bentoml
from <http://bentoml.io|bentoml.io> import Text
from <http://bentoml.io|bentoml.io> import NumpyNdarray

CLASS_NAMES = ["setosa", "versicolor", "virginica"]

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(
    input=NumpyNdarray.from_sample(np.array([4.9, 3.0, 1.4, 0.2], dtype=np.double)),
    output=Text(),
)
async def classify(features: np.ndarray) -> str:
    with bentoml.monitor("iris_classifier_prediction") as mon:
        mon.log(features[0], name="sepal length", role="feature", data_type="numerical")
        mon.log(features[1], name="sepal width", role="feature", data_type="numerical")
        mon.log(features[2], name="petal length", role="feature", data_type="numerical")
        mon.log(features[3], name="petal width", role="feature", data_type="numerical")

        results = await iris_clf_runner.predict.async_run([features])
        result = results[0]
        category = CLASS_NAMES[result]

        mon.log(category, name="pred", role="prediction", data_type="categorical")
    return category

• Enabled monitoring data collection through log file forwarding using any forwarders (fluentbit, filebeat, logstash) or OTLP exporter implementations. ◦ Configuration for monitoring data collection through log files.

Copy code

monitoring:
  enabled: true
  type: default
  options:
    log_path: path/to/log/file

• Configuration for monitoring data collection through an OTLP exporter.

Copy code

monitoring:
  enable: true
  type: otlp
  options:
    endpoint: <http://localhost:5000>
    insecure: true
    credentials: null
    headers: null
    timeout: 10
    compression: null
    meta_sample_rate: 1.0

• Supported third-party monitoring data collector integrations through BentoML Plugins. See bentoml/plugins repository for more details. 🐳 Improved containerization SDK and CLI options, read more in #3164. • Added support for multiple backend builder options (Docker, nerdctl, Podman, Buildah, Buildx) in addition to buildctl (standalone buildkit builder). • Improved Python SDK for containerization with different backend builder options.

Copy code

import bentoml

bentoml.container.build("iris_classifier:latest", backend="podman", features=["grpc","grpc-reflection"], **kwargs)

• Improved CLI to include the newly added options.

Copy code

import bentoml

bentoml.container.build("iris_classifier:latest", backend="podman", features=["grpc","grpc-reflection"], **kwargs)

• Standardized the generated Dockerfile in bentos to be compatible with all build tools for use cases that require building from a Dockerfile directly. 💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML. • Learn more about inference data collection and model monitoring capabilities in BentoML. • Learn more about the default metrics that comes out-of-the-box and how to add custom metrics in BentoML.

🎉 12

❤️ 7

🚀 9

🙌 4

💯 6

🔥 4

🍱 13

12/12/2022, 9:12 PM

Hello everyone, We’re excited to chat with Doris Xin on Dec 19th, 1-2 pm PST // 4-5 pm EST for our next AMA session on all things MLOps. Doris Xin is the founder of LineaPy. LineaPy is a Python package that helps data scientists capture, analyze, and automate their workflows. It traces code execution to understand the code and its context and provides tools to help data scientists bring their work to production more easily. LineaPy can be integrated into a data science workflow with just two lines of code. For this session, join us to ask questions about the following topics: • The emerging MLOps toolchain • The challenges and opportunities of building an open-source project • The challenges of implementing MLOps in practice. • The challenges and opportunities that LineaPY is facing Join us next Monday to ask Doris your questions and get real-time responses in the #C03U2PT7UUQ channel! If you can’t attend, you can write your questions in this thread, and I will ask them for you!

❤️ 6

12/15/2022, 5:34 PM

BentoML and Arize AI are thrilled to announce our partnership that streamlines the machine learning development lifecycle to supercharge production ML. This integration enables users to create, ship, troubleshoot, and improve models in real-time. Learn more about how Arize AI and BentoML help continuously ship new models and improve model performance in our latest blog, co-published with Arize AI. https://modelserving.com/blog/supercharge-production-ml-with-bentoml-and-arize-ai

🚀 4

👀 2

👍 11

🔥 2

🍱 8

❤️ 2

💯 3

Slackbot

01/05/2023, 5:42 PM

This message was deleted.

01/10/2023, 7:46 PM

We’re excited to chat with Hangfei Lin on January 19th, 1-2pm PST/4-5pm EST for our AMA session on all things MLOps. Hangfei is a staff software engineer at LinkedIn. He works on machine learning infrastructure and systems. He builds feature store and feature management platforms that scale and power machine learning development and application in Linkedin. Hangfei is a main contributor and driver for the open-source Feathr(https://github.com/feathr-ai/feathr ), a very popular feature store library that is used in Linkedin for many years. Feathr is scalable, highly customizable, and provide native python API for easy workflow. For this session, join us to ask questions about: • The emerging MLOps toolchain • Feature store in ML workflow • Best practices for using feature store in ML workflows

Slackbot

01/20/2023, 3:58 AM

This message was deleted.

👍 8

❤️ 5

🎉 4

💯 2

🚀 2

🍱 13

Sean

02/17/2023, 5:00 PM

🍱 <!channel> BentoML

v1.0.15

release is here featuring the introduction of the

bentoml.diffusers

framework. • Learn more about the capabilities of the

bentoml.diffusers

framework in the Creating Stable Diffusion 2.0 Services With BentoML And Diffusers blog and BentoML Diffusers example project. • Import a diffusion model with the

bentoml.diffusers.import_model

API.

Copy code

bentoml.diffusers.import_model(
    "sd2",
    "stabilityai/stable-diffusion-2",
)

• Create a

text2img

service using a Stable Diffusion 2.0 model runner with the familiar

to_runner

API from the

bentoml.diffuser

framework.

Copy code

import torch
from diffusers import StableDiffusionPipeline

import bentoml
from <http://bentoml.io|bentoml.io> import Image, JSON, Multipart

bento_model = bentoml.diffusers.get("sd2:latest")
stable_diffusion_runner = bento_model.to_runner()

svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner])

@svc.api(input=JSON(), output=Image())
def txt2img(input_data):
    images, _ = stable_diffusion_runner.run(**input_data)
    return images[0]

⭐ Fixed a incompatibility change introduced in

starlette==0.25.0

result in the type

MultiPartMessage

not being found in

starlette.formparsers

Copy code

ImportError: cannot import name 'MultiPartMessage' from 'starlette.formparsers' (/opt/miniconda3/envs/bentoml/lib/python3.10/site-packages/starlette/formparsers.py)

👏 1

🎉 1

✅ 1

Sean

02/22/2023, 10:55 PM

🍱 <!here> Hello Chefs! The BentoML team is currently hiring for multiple remote-first positions in software and community engineering. If you or someone you know is interested in joining our team, please don’t hesitate to send me a private message. Thank you for your continued support in making our community better! https://angel.co/company/bentoml

party parrot 7

❤️ 5

🚀 5

🔥 4

💯 4

👍 4

👏 1

Slackbot

03/14/2023, 9:21 PM

This message was deleted.

❤️ 15

🍱 8

👍 4

💪 3

🚀 6

🎉 7

💡 3

💯 4

🙏 3

👀 2

🙌 23

🔥 10

Slackbot

04/06/2023, 8:59 PM

This message was deleted.

👍 7

🎉 25

👏 11

❤️ 9

✅ 1

🍱 3

Slackbot

04/14/2023, 4:00 PM

This message was deleted.

💯 1

💡 1

🙏 1

👍 1

💪 1

🔥 3

❤️ 2

🎉 14

🍱 12

🚀 2

Sean

05/10/2023, 1:28 AM

🍱 BentoML

v1.0.19

is released with enhanced GPU utilization and expanded ML framework support. • Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the

workers_per_resource

scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance.

workers_per_resource

is 1 by default.

Copy code

runners:
  iris:
	  resources:
	    <http://nvidia.com/gpu|nvidia.com/gpu>: 1
	  workers_per_resource: 2

• New ML framework support: We’ve added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks. • Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency. • Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to

v4.18

, ensuring a seamless experience for users with older versions. ⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML’s cloud-native components. Prior, developers were limited to exporting and deploying Bento as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling. 💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML. • Learn more scheduling strategy to get better resource utilization. • Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework. • Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.

🎉 2

Slackbot

05/10/2023, 1:29 AM

This message was deleted.

🚀 9

🍱 11

❤️ 4

💡 2

🙏 2

🙌 2

💯 2

party parrot 3

🎉 22

Tim Liu

05/31/2023, 8:01 PM

Hello! We're hosting a last minute MLOps meetup in Atlanta if there's anyone in the area: https://www.meetup.com/atlanta-mlops-community/events/293886684/ As a reminder, BentoML would love to sponsor food for any event where you may be talking about BentoML! :)

🍱 5

🍕 2

Slackbot

06/12/2023, 8:49 PM

This message was deleted.

🔥 2

💯 2

🦜 3

🎉 25

👏 8

🚀 13

🍱 5

💡 2

👍 2

💪 2

❤️ 3

Sean

07/24/2023, 9:14 PM

<!channel> 🍱 We’re thrilled to announce the release of BentoML

v1.1.0

, our first minor version update since the milestone v1.0. • Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0. • Official gRPC Support: We’ve transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services. • Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration. • Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries. • Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store. 🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models. • GPU and CPU Support: Running Llama is support on both GPU and CPU. • Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face. Users can use any weights on HuggingFace (e.g.

TheBloke/Llama-2-13B-chat-GPTQ

), custom weights from local path (e.g.

/path/to/llama-1

), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM. Use

openllm models --show-available

to learn more. • Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground,

python -m openllm.playground.llama2_qlora --help

🙏 4

🚀 3

🎉 3

👀 3

🍱 7

👏 14

party parrot 14

Sean

08/31/2023, 6:59 PM

<!channel> 🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML

v1.1.4

and OpenLLM

v0.2.27

. See an example service definition for SSE streaming with Llama2. • Added response streaming through SSE to the

bentoml.io.Text

IO Descriptor type. • Added async generator support to both API Server and Runner to

yield

incremental text responses. • Added supported to ☁️ BentoCloud to natively support SSE streaming. 🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs. • Added

/v1/generate_stream

endpoint for streaming responses from LLMs.

Copy code

curl -N -X 'POST' '<http://0.0.0.0:3000/v1/generate_stream>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
  "prompt": "### Instruction:\n What is the definition of time (200 words essay)?\n\n### Response:",
  "llm_config": {
    "use_llama2_prompt": false,
    "max_new_tokens": 4096,
    "early_stopping": false,
    "num_beams": 1,
    "num_beam_groups": 1,
    "use_cache": true,
    "temperature": 0.89,
    "top_k": 50,
    "top_p": 0.76,
    "typical_p": 1,
    "epsilon_cutoff": 0,
    "eta_cutoff": 0,
    "diversity_penalty": 0,
    "repetition_penalty": 1,
    "encoder_repetition_penalty": 1,
    "length_penalty": 1,
    "no_repeat_ngram_size": 0,
    "renormalize_logits": false,
    "remove_invalid_values": false,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "encoder_no_repeat_ngram_size": 0,
    "n": 1,
    "best_of": 1,
    "presence_penalty": 0.5,
    "frequency_penalty": 0,
    "use_beam_search": false,
    "ignore_eos": false
  },
  "adapter_name": null
}'

openllm_sse.mp4

👍 12

🍱 11

🚀 11

👏 12

🔥 7

party parrot 4

💪 2

💡 2

🤙 2

🙌 3

🏄 2

Chaoyu

10/06/2023, 6:14 PM

Hi everyone! Next Thursday we will have an in-person workshop San Francisco on building and evaluating LLM apps! We are hosting a hands-on workshop together with Arize, where you’ll learn how to build and deploy LLM apps with BentoML OpenLLM, as well as how to troubleshoot, evaluate and trace your LLM app with Arize Phoenix. Register here: https://lu.ma/lk2hsmgl

🍱 3

🎉 2

🚀 2

🏁 2

Jian Shen Yap

01/22/2024, 4:53 AM

<!channel> Happy New Year to everyone from the BentoML team! 🍱 🎊 We recently revamped our slack channel to better facilitate the growth of our community. What's New? We've updated our channels to enhance your experience and streamline our discussions: • #introductions: This is where your BentoML journey begins. Tell us about yourself! What do you do, where are you from, and what interesting projects or experiences have you gathered? It's the perfect spot to meet fellow members and spark collaborations. • #ask-for-help (formerly #support): Got questions about BentoML or BentoCloud? This is your go-to channel for assistance. No query is too big or too small – our community thrives on supporting each other. • #ai-ml-everything (formerly #machine-learning-potluck): Consider this the heart of our community. A place for all discussions - be it AI, ML, or the latest tech trends. Think of it as your new #general channel. • #i-made-this: Showcase alert! Share your BentoML projects, discover what others are building, and get inspired. It's our curated gallery of innovation and creativity. • #job-posting: On the lookout for new opportunities or seeking talented ML professionals? This channel connects job seekers with recruiters. Dive in for your next career move or to find your next team member. Get Involved! Your participation makes our community richer. Whether you're here to learn, share, or collaborate, every contribution counts. We believe in the power of open-source and the incredible impact we can make together with BentoML. Dive deeper into our world 🤿 : • Explore our blog to learn more about our projects, success stories, and insights from our team: https://www.bentoml.com/blog • Check out BentoCloud, the easiest way to deploy BentoML, optimized for performance, scalability, and cost-efficiency: https://www.bentoml.com/cloud

💯 6

❤️ 6

🍱 1

👍 1

👀 1

Sean

02/20/2024, 4:00 PM

<!channel> 🍱 We are excited to share with you that we have released BentoML

v1.2

, the biggest release since the launch of

v1.0

. This release includes improvements from all the learning and feedback from our community over the past year. We invite you to read our release blog post for a comprehensive overview of the new features and the motivations behind their development. Here are a few key points to note before we delve into the new features: •

v1.2

ensures complete backward compatibility, meaning that Bentos built with

v1.1

will continue to function seamlessly with this release. • We remain committed to supporting

v1.1

. Critical bug fixes and security updates will be backported to the

v1.1

branch. • BentoML documentation has been updated with examples and guides for

v1.2

. More guides are being added every week. • BentoCloud is fully equipped to handle deployments from both

v1.1

and

v1.2

releases of BentoML. ⛏️ Introduced a simplified service SDK to empower developers with greater control and flexibility. • Simplified the service and API interfaces as Python classes, allowing developers to add custom logic and use third party libraries flexibly with ease. • Introduced

@bentoml.service

and

@bentoml.api

decorators to customize the behaviors of services and APIs. • Moved configuration from YAML files to the service decorator

@bentoml.service

next to the class definition. • See the vLLM example demonstrating the flexibility of the service API by initializing a vLLM AsyncEngine in the service constructor and run inference with continuous batching in the service API. 🔭 Revamped IO descriptors with more familiar input and output types. • Enable use of Pythonic types directly, without the need for additional IO descriptor definitions or decorations. • Integrated with Pydantic to leverage its robust validation capabilities and wide array of supported types. • Expanded support to ML and Generative AI specific IO types. 📦 Updated model saving and loading API to be more generic to enable integration with more ML frameworks. • Allow flexible saving and loading models using the

bentoml.models.create

API instead of framework specific APIs, e.g.

bentoml.pytorch.save_model

bentoml.tensorflow.save_model

. 🚚 Streamlined the deployment workflow to allow more rapid development iterations and a faster time to production. • Enabled direct deployment to production through CLI and Python API from Git projects. 🎨 Improved API development experience with generated web UI and rich Python client. • All bentos are now accompanied by a custom-generated UI in the BentoCloud Playground, tailored to their API definitions. • BentoClient offers a Pythonic way to invoke the service endpoint, allowing parameters to be supplied in native Python format, letting the client efficiently handles the necessary serialization while ensuring compatibility and performance. 🎭 We’ve learned that the best way to showcase what BentoML can do is not through dry, conceptual documentation but through real-world examples. Check out our current list of examples, and we’ll continue to publish new ones to the gallery as exciting new models are released. • BentoVLLM • BentoControlNet • BentoSDXLTurbo • BentoWhisperX • BentoXTTS • BentoCLIP 🙏 Thank you for your continued support!

🚀 8

❤️ 9

🙌 11

🎉 4

🍱 12

🦜 12

bentoml 6

🎯 5

🔥 18

Chaoyu

02/27/2024, 7:55 PM

Hi everyone! For those interested in scaling BentoML deployment on Kubernetes - I’m excited to share that we started a proposal for Yatai version 2.0. Here’re some highlights: • Simplify setup for DevOps teams • Moving from Elastic License 2 to Apache License V2 • Support for Distributed Service deployment mode in BentoML 1.2 • Simplify advanced customization and integration with Cloud Native tools • Call for contribution - this is the perfect time to get involved! More details can be found in https://github.com/bentoml/Yatai/issues/504 And please join #yatai if you’re interested in learning about the progress or help with contribution

🙌 5

🚀 5

Sherlock Xu

07/12/2024, 7:37 AM

@channel Hi everyone! We are thrilled to announce the release of OpenLLM 0.6 🚀, which marks a significant shift in our project's philosophy. This release introduces breaking changes to the codebase, reflecting our renewed focus on streamlining cloud deployment for LLMs. In the previous releases, our goal was to provide users with the ability to fully customize their LLM deployment. However, we realized that the customization support in OpenLLM led to scope creep, deviating from our core focus on making LLM deployment simple. With the rise of open-source LLMs and the growing emphasis on LLM-focused application development, we have decided to concentrate on what OpenLLM does best - simplifying LLM deployment. As such, we have completely revamped the architecture to make OpenLLM a tool that simplifies running LLMs as an API endpoint, prioritizing ease of use and performance. This means that 0.6 breaks away from many of the old Python APIs provided in 0.5, emphasizing itself as an easy-to-use CLI tool with cross-platform compatibility for users to deploy open-source LLMs. Some of the coolest features and capabilities include: • Broad LLM support: Support a wide variety of open-source LLMs, including those fine-tuned with your own data or enhanced through advanced quantization. • OpenAI-compatible endpoints: Serve your LLMs with endpoints fully compatible with OpenAI standards, ensuring ease of integration. • Enhanced decoding speed: Accelerated LLM decoding powered by the state-of-the-art inference backend. • Interactive chat UI: Chat with different models with a built-in chat user interface. • Enterprise-grade cloud deployment: Optionally to deploy to BentoCloud with a single command for an enterprise-grade LLM API endpoint. To learn more, visit the OpenLLM repository. 🤝 We invite you to explore the new release, provide feedback, and join us in our mission to make cloud deployment of LLMs accessible and efficient for everyone. 🙏 Thank you for your continued support and trust in OpenLLM. We look forward to seeing the incredible applications you will build with the tool!

🍱 3

👍 3

❤️ 3

🎉 5

party parrot 1

Sherlock Xu

07/19/2024, 12:48 PM

<!channel> Hi everyone! We are excited to announce the release of BentoML 1.3! Following the feedback received since the launch of 1.2 earlier this year, we are introducing a host of new features and enhancements in 1.3. Below are the key highlights of 1.3 and stay tuned for an upcoming blog post, where we'll provide a detailed exploration of the new features and the driving forces behind the development. 🕙 Implemented BentoML task execution ◦ Introduced the

@bentoml.task

decorator to set a task endpoint for executing long-running workloads (such as batch processing or video generation). ◦ Added the

.submit()

method to both the sync and async clients, which can submit task inputs via the task endpoint and dedicated worker processes constantly monitor task queues for new work to perform. ◦ Full compatibility with BentoCloud to run Bentos defined with task endpoints. ◦ See the Services and Clients doc with examples of a Service API by initializing a long running task in the Service constructor, creating clients to call the endpoint, and retrieving task status. 🚀 Optimized the build cache to accelerate the build process ◦ Enhanced build speed for

bentoml build

containerize

through pre-installed large packages like

torch

◦ Switch to

uv

as the installer and resolver, replacing

pip

🔨 Supported concurrency-based autoscaling on BentoCloud ◦ Added the

concurrency

configuration to the

@bentoml.service

decorator to set the ideal number of simultaneous requests a Service is designed to handle. ◦ Added the

external_queue

configuration to the

@bentoml.service

decorator to queue excess requests until they can be processed within the defined

concurrency

limits. ◦ See the documentation to configure concurrency and external queue. 🔒 Secure data handling with secrets in BentoCloud ◦ You can now create and manage credentials, such as HuggingFace tokens and AWS secrets, securely on BentoCloud and easily apply them across multiple Deployments. ◦ Added secret subcommands to the BentoML CLI for secret management. Run

bentoml secret -h

to learn more. 🗒️ Added streamed logs for Bento image deployment ◦ Easier to troubleshoot build issues and enable faster development iterations 🙏 Thank you for your continued support! Feel free to try 1.3 now!

👍 1

🎉 25

🍱 5

🚀 10

Sherlock Xu

02/20/2025, 1:38 PM

<!channel> Hello everyone! We are thrilled to announce the release of BentoML 1.4! This version introduces several new features and improvements to accelerate your iteration cycle and enhance the overall developer experience. Below are the key highlights of 1.4, and you can find more details in the release blog post. 🚀 20x faster iteration with Codespaces ◦ Introduced BentoML Codespaces, a development platform built on BentoCloud ◦ Added the

bentoml code

command for creating a Codespace ◦ Auto-sync of local changes to the cloud environment ◦ Access to a variety of powerful cloud GPUs ◦ Real-time logs and debugging through the cloud dashboard ◦ Eliminate dependency headaches and ensure consistency between dev and prod environments 🐍 New Python SDK for runtime configurations ◦ Added

bentoml.images.PythonImage

for defining the Bento runtime environment in Python instead of using

bentofile.yaml

pyproject.toml

◦ Support customizing runtime configurations (e.g., Python version, system packages, and dependencies) directly in the

service.py

file ◦ Introduced context-sensitive

run()

method for running custom build commands ◦ Backward compatible with existing

bentofile.yaml

and

pyproject.toml

configurations ⚡ Accelerated model loading ◦ Implemented build-time model downloads and parallel loading of model weights using safetensors to reduce cold start time and improve scaling performance. See the documentation to learn more. ◦ Added

bentoml.models.HuggingFaceModel

for loading models from HF. It supports private model repositories and custom endpoints ◦ Added

bentoml.models.BentoModel

for loading models from BentoCloud and the Model Store 🌍 External deployment dependencies ◦ Extended

bentoml.depends()

to support external deployments ◦ Added support for calling BentoCloud Deployments via name or URL ◦ Added support for calling self-hosted HTTP AI services outside BentoCloud ⚠️ Legacy Service API deprecation ◦ The legacy

bentoml.Service

API (with runners) is now officially deprecated and is scheduled for removal in a future release. We recommend you use the

@bentoml.service

decorator. Note that: •

1.4

remains fully compatible with Bentos created by

1.3

. • The BentoML documentation has been updated with examples and guides for

1.4

. 🙏 As always, we appreciate your continued support!

🎉 11

🍱 4

✅ 2

❤️ 17

Sean

04/22/2025, 4:00 PM

Hi Bento Community, We’re writing to make you aware of two recent security advisories involving unsafe deserialization of Python pickle data in BentoML that could enable remote‑code execution (RCE) when a request is sent with the

Content‑Type: application/vnd.bentoml+pickle

header. CVE‑2025‑27520: • Scope: Insecure pickle deserialization in the entry service • Affected versions: BentoML ≥ 1.3.4 and < 1.4.3 • Action: Upgrade to v1.4.3 or later. CVE‑2025‑32375: • Scope: Insecure pickle deserialization in dependent (runner) services • Affected versions: BentoML ≤ v1.4.8 • Exposure: Only when runners are launched explicitly with

bentoml start-runner-server

. ◦ Deployments started with standard

bentoml serve

and containerized via ◦

bentoml containerize

are not exposed, because runner ports are not published. ◦ As of v1.4.8, the

start-runner-server

sub‑command has been removed, fully closing this attack vector. • Action: Upgrade to v1.4.8 or later. Recommended next steps: 1. Upgrade immediately to the minimum safe version listed above (or any newer release). 2. Audit ingress rules to ensure only intended content types are accepted if pickle support is truly required; otherwise, consider disabling pickle inputs altogether. If you have questions or need assistance, please open an issue or reach out in our community Slack. Stay safe, The BentoML Team

❤️ 6

👍 1