https://bentoml.com logo
Join Slack
Powered by
# announcements
  • b

    Bo

    11/02/2022, 9:38 PM
    Hello folks We’re excited to have Hamza Tahir chat with our community Tuesday, Nov 8th, 9-10am PST/12-1pm EST for our AMA session on all things MLOps. Hamza is the co-founder of ZenML. ZenML is an open-source MLOps framework to create reproducible Machine Learning pipelines. A reoccurring struggle in machine learning is the large gap between the training part of the ML development and the post-training/deployment phase. ZenML aims to bridge that gap, by building a simple, open-source, pipeline framework aimed toward data scientists to create ML workflows that can be taken to production with minimum effort. For this session, please join us and ask questions about: • MLOps orchestration • The emerging MLOps toolchain • What’s the missing gap in MLOps If you can’t attend this live AMA, you can leave your questions in this slack thread and I will post them for you.
    👀 1
    🔥 1
    👍 1
    🚀 1
    🎉 1
    💯 1
  • s

    Slackbot

    11/04/2022, 6:15 PM
    This message was deleted.
    👏 5
    🙌 6
    🚀 12
    a
    t
    s
    • 4
    • 3
  • s

    Slackbot

    11/09/2022, 2:39 AM
    This message was deleted.
    💯 1
    🍱 3
    👍 6
    🚀 4
    ❤️ 3
    🔥 1
    m
    • 2
    • 1
  • b

    Bo

    11/11/2022, 7:38 PM
    We’re excited to chat with Alessya Visnjicon on November 17th, 1-2 pm PST/4-5 pm EST for our AMA session on all things MLOps. Alessya Visnjic is the CEO and co-founder of WhyLabs, the AI Observability company on a mission to build the interface between AI and human operators. Prior to WhyLabs, Alessya was a CTO-in-residence at the Allen Institute for AI (AI2), where she evaluated the commercial potential for the latest advancements in AI research. Earlier in her career, Alessya spent 9 years at Amazon leading Machine Learning adoption and tooling efforts. She was a founding member of Amazon’s first ML research center in Berlin, Germany. Alessya is also the founder of Rsqrd AI, a global community of 1,000+ AI practitioners who are committed to making AI technology Robust & Responsible. For this session, join us to ask questions about: • The emerging MLOps toolchain • Productionalizing your model • Troubleshooting model issues in real-time • Improving your overall model performance • OpenSource approach for MLOps If you can’t attend this live AMA, you can leave your questions in this slack thread and I will post them for you.
    👍 10
    🚀 8
    🔥 7
    ❤️ 5
  • b

    Bo

    11/28/2022, 7:48 PM
    We’re excited to chat with Kevin Kho on December 8th, 1-2 pm PST/4-5 pm EST, for our AMA session on all things MLOps. Kevin Kho is a maintainer for the Fugue project, an abstraction layer for distributed computing. Previously, he was an Open Source Community Engineer at Prefect, a workflow orchestration management system. Before working on data tooling, he was a data scientist for 4 years. For this session, join us to ask questions about the following: • The emerging MLOps toolchain • Distributed computing and ML • Present and future of distributed computing. If you can’t attend this live AMA, you can leave your questions in this slack thread, and I will post them for you.
    👏 1
    👍 2
    🔥 5
  • s

    Sean

    12/07/2022, 8:37 PM
    🍱 <!channel> BentoML
    v1.0.11
    is here featuring the introduction of an inference collection and model monitoring API that can be easily integrated with any model monitoring frameworks. • Introduced the
    bentoml.monitor
    API for monitoring any features, predictions, and target data in numerical, categorical, and numerical sequence types.
    Copy code
    import bentoml
    from <http://bentoml.io|bentoml.io> import Text
    from <http://bentoml.io|bentoml.io> import NumpyNdarray
    
    CLASS_NAMES = ["setosa", "versicolor", "virginica"]
    
    iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
    svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
    
    @svc.api(
        input=NumpyNdarray.from_sample(np.array([4.9, 3.0, 1.4, 0.2], dtype=np.double)),
        output=Text(),
    )
    async def classify(features: np.ndarray) -> str:
        with bentoml.monitor("iris_classifier_prediction") as mon:
            mon.log(features[0], name="sepal length", role="feature", data_type="numerical")
            mon.log(features[1], name="sepal width", role="feature", data_type="numerical")
            mon.log(features[2], name="petal length", role="feature", data_type="numerical")
            mon.log(features[3], name="petal width", role="feature", data_type="numerical")
    
            results = await iris_clf_runner.predict.async_run([features])
            result = results[0]
            category = CLASS_NAMES[result]
    
            mon.log(category, name="pred", role="prediction", data_type="categorical")
        return category
    • Enabled monitoring data collection through log file forwarding using any forwarders (fluentbit, filebeat, logstash) or OTLP exporter implementations. ◦ Configuration for monitoring data collection through log files.
    Copy code
    monitoring:
      enabled: true
      type: default
      options:
        log_path: path/to/log/file
    • Configuration for monitoring data collection through an OTLP exporter.
    Copy code
    monitoring:
      enable: true
      type: otlp
      options:
        endpoint: <http://localhost:5000>
        insecure: true
        credentials: null
        headers: null
        timeout: 10
        compression: null
        meta_sample_rate: 1.0
    • Supported third-party monitoring data collector integrations through BentoML Plugins. See bentoml/plugins repository for more details. 🐳 Improved containerization SDK and CLI options, read more in #3164. • Added support for multiple backend builder options (Docker, nerdctl, Podman, Buildah, Buildx) in addition to buildctl (standalone buildkit builder). • Improved Python SDK for containerization with different backend builder options.
    Copy code
    import bentoml
    
    bentoml.container.build("iris_classifier:latest", backend="podman", features=["grpc","grpc-reflection"], **kwargs)
    • Improved CLI to include the newly added options.
    Copy code
    import bentoml
    
    bentoml.container.build("iris_classifier:latest", backend="podman", features=["grpc","grpc-reflection"], **kwargs)
    • Standardized the generated Dockerfile in bentos to be compatible with all build tools for use cases that require building from a Dockerfile directly. 💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML. • Learn more about inference data collection and model monitoring capabilities in BentoML. • Learn more about the default metrics that comes out-of-the-box and how to add custom metrics in BentoML.
    🎉 12
    ❤️ 7
    🚀 9
    🙌 4
    💯 6
    🔥 4
    🍱 13
  • b

    Bo

    12/12/2022, 9:12 PM
    Hello everyone, We’re excited to chat with Doris Xin on Dec 19th, 1-2 pm PST // 4-5 pm EST for our next AMA session on all things MLOps. Doris Xin is the founder of LineaPy. LineaPy is a Python package that helps data scientists capture, analyze, and automate their workflows. It traces code execution to understand the code and its context and provides tools to help data scientists bring their work to production more easily. LineaPy can be integrated into a data science workflow with just two lines of code. For this session, join us to ask questions about the following topics: • The emerging MLOps toolchain • The challenges and opportunities of building an open-source project • The challenges of implementing MLOps in practice. • The challenges and opportunities that LineaPY is facing Join us next Monday to ask Doris your questions and get real-time responses in the #C03U2PT7UUQ channel! If you can’t attend, you can write your questions in this thread, and I will ask them for you!
    ❤️ 6
  • b

    Bo

    12/15/2022, 5:34 PM
    BentoML and Arize AI are thrilled to announce our partnership that streamlines the machine learning development lifecycle to supercharge production ML. This integration enables users to create, ship, troubleshoot, and improve models in real-time. Learn more about how Arize AI and BentoML help continuously ship new models and improve model performance in our latest blog, co-published with Arize AI. https://modelserving.com/blog/supercharge-production-ml-with-bentoml-and-arize-ai
    🚀 4
    👀 2
    👍 11
    🔥 2
    🍱 8
    ❤️ 2
    💯 3
  • s

    Slackbot

    01/05/2023, 5:42 PM
    This message was deleted.
    a
    b
    s
    • 4
    • 3
  • b

    Bo

    01/10/2023, 7:46 PM
    We’re excited to chat with Hangfei Lin on January 19th, 1-2pm PST/4-5pm EST for our AMA session on all things MLOps. Hangfei is a staff software engineer at LinkedIn. He works on machine learning infrastructure and systems. He builds feature store and feature management platforms that scale and power machine learning development and application in Linkedin. Hangfei is a main contributor and driver for the open-source Feathr(https://github.com/feathr-ai/feathr), a very popular feature store library that is used in Linkedin for many years. Feathr is scalable, highly customizable, and provide native python API for easy workflow. For this session, join us to ask questions about: • The emerging MLOps toolchain • Feature store in ML workflow • Best practices for using feature store in ML workflows
  • s

    Slackbot

    01/20/2023, 3:58 AM
    This message was deleted.
    👍 8
    ❤️ 5
    🎉 4
    💯 2
    🚀 2
    🍱 13
    r
    s
    +2
    • 5
    • 10
  • s

    Sean

    02/17/2023, 5:00 PM
    🍱 <!channel> BentoML
    v1.0.15
    release is here featuring the introduction of the
    bentoml.diffusers
    framework. • Learn more about the capabilities of the
    bentoml.diffusers
    framework in the Creating Stable Diffusion 2.0 Services With BentoML And Diffusers blog and BentoML Diffusers example project. • Import a diffusion model with the
    bentoml.diffusers.import_model
    API.
    Copy code
    bentoml.diffusers.import_model(
        "sd2",
        "stabilityai/stable-diffusion-2",
    )
    • Create a
    text2img
    service using a Stable Diffusion 2.0 model runner with the familiar
    to_runner
    API from the
    bentoml.diffuser
    framework.
    Copy code
    import torch
    from diffusers import StableDiffusionPipeline
    
    import bentoml
    from <http://bentoml.io|bentoml.io> import Image, JSON, Multipart
    
    bento_model = bentoml.diffusers.get("sd2:latest")
    stable_diffusion_runner = bento_model.to_runner()
    
    svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner])
    
    @svc.api(input=JSON(), output=Image())
    def txt2img(input_data):
        images, _ = stable_diffusion_runner.run(**input_data)
        return images[0]
    ⭐ Fixed a incompatibility change introduced in
    starlette==0.25.0
    result in the type
    MultiPartMessage
    not being found in
    starlette.formparsers
    .
    Copy code
    ImportError: cannot import name 'MultiPartMessage' from 'starlette.formparsers' (/opt/miniconda3/envs/bentoml/lib/python3.10/site-packages/starlette/formparsers.py)
    👏 1
    🎉 1
    ✅ 1
  • s

    Sean

    02/22/2023, 10:55 PM
    🍱 <!here> Hello Chefs! The BentoML team is currently hiring for multiple remote-first positions in software and community engineering. If you or someone you know is interested in joining our team, please don’t hesitate to send me a private message. Thank you for your continued support in making our community better! https://angel.co/company/bentoml
    party parrot 7
    ❤️ 5
    🚀 5
    🔥 4
    💯 4
    👍 4
    👏 1
  • s

    Slackbot

    03/14/2023, 9:21 PM
    This message was deleted.
    ❤️ 15
    🍱 8
    👍 4
    💪 3
    🚀 6
    🎉 7
    💡 3
    💯 4
    🙏 3
    👀 2
    🙌 23
    🔥 10
    e
    k
    • 3
    • 2
  • s

    Slackbot

    04/06/2023, 8:59 PM
    This message was deleted.
    👍 7
    🎉 25
    👏 11
    ❤️ 9
    ✅ 1
    🍱 3
    s
    • 2
    • 1
  • s

    Slackbot

    04/14/2023, 4:00 PM
    This message was deleted.
    💯 1
    💡 1
    🙏 1
    👍 1
    💪 1
    🔥 3
    ❤️ 2
    🎉 14
    🍱 12
    🚀 2
    s
    c
    +2
    • 5
    • 9
  • s

    Sean

    05/10/2023, 1:28 AM
    🍱 BentoML
    v1.0.19
    is released with enhanced GPU utilization and expanded ML framework support. • Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the
    workers_per_resource
    scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance.
    workers_per_resource
    is 1 by default.
    Copy code
    runners:
      iris:
    	  resources:
    	    <http://nvidia.com/gpu|nvidia.com/gpu>: 1
    	  workers_per_resource: 2
    • New ML framework support: We’ve added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks. • Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency. • Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to
    v4.18
    , ensuring a seamless experience for users with older versions. ⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML’s cloud-native components. Prior, developers were limited to exporting and deploying Bento as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling. 💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML. • Learn more scheduling strategy to get better resource utilization. • Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework. • Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.
    🎉 2
  • s

    Slackbot

    05/10/2023, 1:29 AM
    This message was deleted.
    🚀 9
    🍱 11
    ❤️ 4
    💡 2
    🙏 2
    🙌 2
    💯 2
    party parrot 3
    🎉 22
    e
    t
    s
    • 4
    • 3
  • t

    Tim Liu

    05/31/2023, 8:01 PM
    Hello! We're hosting a last minute MLOps meetup in Atlanta if there's anyone in the area: https://www.meetup.com/atlanta-mlops-community/events/293886684/ As a reminder, BentoML would love to sponsor food for any event where you may be talking about BentoML! :)
    🍱 5
    🍕 2
  • s

    Slackbot

    06/12/2023, 8:49 PM
    This message was deleted.
    🔥 2
    💯 2
    🦜 3
    🎉 25
    👏 8
    🚀 13
    🍱 5
    💡 2
    👍 2
    💪 2
    ❤️ 3
    s
    k
    • 3
    • 2
  • s

    Sean

    07/24/2023, 9:14 PM
    <!channel> 🍱 We’re thrilled to announce the release of BentoML
    v1.1.0
    , our first minor version update since the milestone v1.0. • Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0. • Official gRPC Support: We’ve transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services. • Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration. • Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries. • Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store. 🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models. • GPU and CPU Support: Running Llama is support on both GPU and CPU. • Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face. Users can use any weights on HuggingFace (e.g.
    TheBloke/Llama-2-13B-chat-GPTQ
    ), custom weights from local path (e.g.
    /path/to/llama-1
    ), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM. Use
    openllm models --show-available
    to learn more. • Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground,
    python -m openllm.playground.llama2_qlora --help
    .
    🙏 4
    🚀 3
    🎉 3
    👀 3
    🍱 7
    👏 14
    party parrot 14
  • s

    Sean

    08/31/2023, 6:59 PM
    <!channel> 🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML
    v1.1.4
    and OpenLLM
    v0.2.27
    . See an example service definition for SSE streaming with Llama2. • Added response streaming through SSE to the
    bentoml.io.Text
    IO Descriptor type. • Added async generator support to both API Server and Runner to
    yield
    incremental text responses. • Added supported to ☁️ BentoCloud to natively support SSE streaming. 🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs. • Added
    /v1/generate_stream
    endpoint for streaming responses from LLMs.
    Copy code
    curl -N -X 'POST' '<http://0.0.0.0:3000/v1/generate_stream>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
      "prompt": "### Instruction:\n What is the definition of time (200 words essay)?\n\n### Response:",
      "llm_config": {
        "use_llama2_prompt": false,
        "max_new_tokens": 4096,
        "early_stopping": false,
        "num_beams": 1,
        "num_beam_groups": 1,
        "use_cache": true,
        "temperature": 0.89,
        "top_k": 50,
        "top_p": 0.76,
        "typical_p": 1,
        "epsilon_cutoff": 0,
        "eta_cutoff": 0,
        "diversity_penalty": 0,
        "repetition_penalty": 1,
        "encoder_repetition_penalty": 1,
        "length_penalty": 1,
        "no_repeat_ngram_size": 0,
        "renormalize_logits": false,
        "remove_invalid_values": false,
        "num_return_sequences": 1,
        "output_attentions": false,
        "output_hidden_states": false,
        "output_scores": false,
        "encoder_no_repeat_ngram_size": 0,
        "n": 1,
        "best_of": 1,
        "presence_penalty": 0.5,
        "frequency_penalty": 0,
        "use_beam_search": false,
        "ignore_eos": false
      },
      "adapter_name": null
    }'
    openllm_sse.mp4
    👍 12
    🍱 11
    🚀 11
    👏 12
    🔥 7
    party parrot 4
    💪 2
    💡 2
    🤙 2
    🙌 3
    🏄 2
  • c

    Chaoyu

    10/06/2023, 6:14 PM
    Hi everyone! Next Thursday we will have an in-person workshop San Francisco on building and evaluating LLM apps! We are hosting a hands-on workshop together with Arize, where you’ll learn how to build and deploy LLM apps with BentoML OpenLLM, as well as how to troubleshoot, evaluate and trace your LLM app with Arize Phoenix. Register here: https://lu.ma/lk2hsmgl
    🍱 3
    🎉 2
    🚀 2
    🏁 2
  • j

    Jian Shen Yap

    01/22/2024, 4:53 AM
    <!channel> Happy New Year to everyone from the BentoML team! 🍱 🎊 We recently revamped our slack channel to better facilitate the growth of our community. What's New? We've updated our channels to enhance your experience and streamline our discussions: • #introductions: This is where your BentoML journey begins. Tell us about yourself! What do you do, where are you from, and what interesting projects or experiences have you gathered? It's the perfect spot to meet fellow members and spark collaborations. • #ask-for-help (formerly #support): Got questions about BentoML or BentoCloud? This is your go-to channel for assistance. No query is too big or too small – our community thrives on supporting each other. • #ai-ml-everything (formerly #machine-learning-potluck): Consider this the heart of our community. A place for all discussions - be it AI, ML, or the latest tech trends. Think of it as your new #general channel. • #i-made-this: Showcase alert! Share your BentoML projects, discover what others are building, and get inspired. It's our curated gallery of innovation and creativity. • #job-posting: On the lookout for new opportunities or seeking talented ML professionals? This channel connects job seekers with recruiters. Dive in for your next career move or to find your next team member. Get Involved! Your participation makes our community richer. Whether you're here to learn, share, or collaborate, every contribution counts. We believe in the power of open-source and the incredible impact we can make together with BentoML. Dive deeper into our world 🤿 : • Explore our blog to learn more about our projects, success stories, and insights from our team: https://www.bentoml.com/blog • Check out BentoCloud, the easiest way to deploy BentoML, optimized for performance, scalability, and cost-efficiency: https://www.bentoml.com/cloud
    💯 6
    ❤️ 6
    🍱 1
    👍 1
    👀 1
  • s

    Sean

    02/20/2024, 4:00 PM
    <!channel> 🍱 We are excited to share with you that we have released BentoML
    v1.2
    , the biggest release since the launch of
    v1.0
    . This release includes improvements from all the learning and feedback from our community over the past year. We invite you to read our release blog post for a comprehensive overview of the new features and the motivations behind their development. Here are a few key points to note before we delve into the new features: •
    v1.2
    ensures complete backward compatibility, meaning that Bentos built with
    v1.1
    will continue to function seamlessly with this release. • We remain committed to supporting
    v1.1
    . Critical bug fixes and security updates will be backported to the
    v1.1
    branch. • BentoML documentation has been updated with examples and guides for
    v1.2
    . More guides are being added every week. • BentoCloud is fully equipped to handle deployments from both
    v1.1
    and
    v1.2
    releases of BentoML. ⛏️ Introduced a simplified service SDK to empower developers with greater control and flexibility. • Simplified the service and API interfaces as Python classes, allowing developers to add custom logic and use third party libraries flexibly with ease. • Introduced
    @bentoml.service
    and
    @bentoml.api
    decorators to customize the behaviors of services and APIs. • Moved configuration from YAML files to the service decorator
    @bentoml.service
    next to the class definition. • See the vLLM example demonstrating the flexibility of the service API by initializing a vLLM AsyncEngine in the service constructor and run inference with continuous batching in the service API. 🔭 Revamped IO descriptors with more familiar input and output types. • Enable use of Pythonic types directly, without the need for additional IO descriptor definitions or decorations. • Integrated with Pydantic to leverage its robust validation capabilities and wide array of supported types. • Expanded support to ML and Generative AI specific IO types. 📦 Updated model saving and loading API to be more generic to enable integration with more ML frameworks. • Allow flexible saving and loading models using the
    bentoml.models.create
    API instead of framework specific APIs, e.g.
    bentoml.pytorch.save_model
    ,
    bentoml.tensorflow.save_model
    . 🚚 Streamlined the deployment workflow to allow more rapid development iterations and a faster time to production. • Enabled direct deployment to production through CLI and Python API from Git projects. 🎨 Improved API development experience with generated web UI and rich Python client. • All bentos are now accompanied by a custom-generated UI in the BentoCloud Playground, tailored to their API definitions. • BentoClient offers a Pythonic way to invoke the service endpoint, allowing parameters to be supplied in native Python format, letting the client efficiently handles the necessary serialization while ensuring compatibility and performance. 🎭 We’ve learned that the best way to showcase what BentoML can do is not through dry, conceptual documentation but through real-world examples. Check out our current list of examples, and we’ll continue to publish new ones to the gallery as exciting new models are released. • BentoVLLM • BentoControlNet • BentoSDXLTurbo • BentoWhisperX • BentoXTTS • BentoCLIP 🙏 Thank you for your continued support!
    🚀 8
    ❤️ 9
    🙌 11
    🎉 4
    🍱 12
    🦜 12
    bentoml 6
    🎯 5
    🔥 18
  • c

    Chaoyu

    02/27/2024, 7:55 PM
    Hi everyone! For those interested in scaling BentoML deployment on Kubernetes - I’m excited to share that we started a proposal for Yatai version 2.0. Here’re some highlights: • Simplify setup for DevOps teams • Moving from Elastic License 2 to Apache License V2 • Support for Distributed Service deployment mode in BentoML 1.2 • Simplify advanced customization and integration with Cloud Native tools • Call for contribution - this is the perfect time to get involved! More details can be found in https://github.com/bentoml/Yatai/issues/504 And please join #yatai if you’re interested in learning about the progress or help with contribution
    🙌 5
    🚀 5
    e
    • 2
    • 2
  • s

    Sherlock Xu

    07/12/2024, 7:37 AM
    @channel Hi everyone! We are thrilled to announce the release of OpenLLM 0.6 🚀, which marks a significant shift in our project's philosophy. This release introduces breaking changes to the codebase, reflecting our renewed focus on streamlining cloud deployment for LLMs. In the previous releases, our goal was to provide users with the ability to fully customize their LLM deployment. However, we realized that the customization support in OpenLLM led to scope creep, deviating from our core focus on making LLM deployment simple. With the rise of open-source LLMs and the growing emphasis on LLM-focused application development, we have decided to concentrate on what OpenLLM does best - simplifying LLM deployment. As such, we have completely revamped the architecture to make OpenLLM a tool that simplifies running LLMs as an API endpoint, prioritizing ease of use and performance. This means that 0.6 breaks away from many of the old Python APIs provided in 0.5, emphasizing itself as an easy-to-use CLI tool with cross-platform compatibility for users to deploy open-source LLMs. Some of the coolest features and capabilities include: • Broad LLM support: Support a wide variety of open-source LLMs, including those fine-tuned with your own data or enhanced through advanced quantization. • OpenAI-compatible endpoints: Serve your LLMs with endpoints fully compatible with OpenAI standards, ensuring ease of integration. • Enhanced decoding speed: Accelerated LLM decoding powered by the state-of-the-art inference backend. • Interactive chat UI: Chat with different models with a built-in chat user interface. • Enterprise-grade cloud deployment: Optionally to deploy to BentoCloud with a single command for an enterprise-grade LLM API endpoint. To learn more, visit the OpenLLM repository. 🤝 We invite you to explore the new release, provide feedback, and join us in our mission to make cloud deployment of LLMs accessible and efficient for everyone. 🙏 Thank you for your continued support and trust in OpenLLM. We look forward to seeing the incredible applications you will build with the tool!
    🍱 3
    👍 3
    ❤️ 3
    🎉 5
    party parrot 1
  • s

    Sherlock Xu

    07/19/2024, 12:48 PM
    <!channel> Hi everyone! We are excited to announce the release of BentoML 1.3! Following the feedback received since the launch of 1.2 earlier this year, we are introducing a host of new features and enhancements in 1.3. Below are the key highlights of 1.3 and stay tuned for an upcoming blog post, where we'll provide a detailed exploration of the new features and the driving forces behind the development. 🕙 Implemented BentoML task execution ◦ Introduced the
    @bentoml.task
    decorator to set a task endpoint for executing long-running workloads (such as batch processing or video generation). ◦ Added the
    .submit()
    method to both the sync and async clients, which can submit task inputs via the task endpoint and dedicated worker processes constantly monitor task queues for new work to perform. ◦ Full compatibility with BentoCloud to run Bentos defined with task endpoints. ◦ See the Services and Clients doc with examples of a Service API by initializing a long running task in the Service constructor, creating clients to call the endpoint, and retrieving task status. 🚀 Optimized the build cache to accelerate the build process ◦ Enhanced build speed for
    bentoml build
    &
    containerize
    through pre-installed large packages like
    torch
    ◦ Switch to
    uv
    as the installer and resolver, replacing
    pip
    🔨 Supported concurrency-based autoscaling on BentoCloud ◦ Added the
    concurrency
    configuration to the
    @bentoml.service
    decorator to set the ideal number of simultaneous requests a Service is designed to handle. ◦ Added the
    external_queue
    configuration to the
    @bentoml.service
    decorator to queue excess requests until they can be processed within the defined
    concurrency
    limits. ◦ See the documentation to configure concurrency and external queue. 🔒 Secure data handling with secrets in BentoCloud ◦ You can now create and manage credentials, such as HuggingFace tokens and AWS secrets, securely on BentoCloud and easily apply them across multiple Deployments. ◦ Added secret subcommands to the BentoML CLI for secret management. Run
    bentoml secret -h
    to learn more. 🗒️ Added streamed logs for Bento image deployment ◦ Easier to troubleshoot build issues and enable faster development iterations 🙏 Thank you for your continued support! Feel free to try 1.3 now!
    👍 1
    🎉 25
    🍱 5
    🚀 10
    a
    • 2
    • 1
  • s

    Sherlock Xu

    02/20/2025, 1:38 PM
    <!channel> Hello everyone! We are thrilled to announce the release of BentoML 1.4! This version introduces several new features and improvements to accelerate your iteration cycle and enhance the overall developer experience. Below are the key highlights of 1.4, and you can find more details in the release blog post. 🚀 20x faster iteration with Codespaces ◦ Introduced BentoML Codespaces, a development platform built on BentoCloud ◦ Added the
    bentoml code
    command for creating a Codespace ◦ Auto-sync of local changes to the cloud environment ◦ Access to a variety of powerful cloud GPUs ◦ Real-time logs and debugging through the cloud dashboard ◦ Eliminate dependency headaches and ensure consistency between dev and prod environments 🐍 New Python SDK for runtime configurations ◦ Added
    bentoml.images.PythonImage
    for defining the Bento runtime environment in Python instead of using
    bentofile.yaml
    or
    pyproject.toml
    ◦ Support customizing runtime configurations (e.g., Python version, system packages, and dependencies) directly in the
    service.py
    file ◦ Introduced context-sensitive
    run()
    method for running custom build commands ◦ Backward compatible with existing
    bentofile.yaml
    and
    pyproject.toml
    configurations ⚡ Accelerated model loading ◦ Implemented build-time model downloads and parallel loading of model weights using safetensors to reduce cold start time and improve scaling performance. See the documentation to learn more. ◦ Added
    bentoml.models.HuggingFaceModel
    for loading models from HF. It supports private model repositories and custom endpoints ◦ Added
    bentoml.models.BentoModel
    for loading models from BentoCloud and the Model Store 🌍 External deployment dependencies ◦ Extended
    bentoml.depends()
    to support external deployments ◦ Added support for calling BentoCloud Deployments via name or URL ◦ Added support for calling self-hosted HTTP AI services outside BentoCloud ⚠️ Legacy Service API deprecation ◦ The legacy
    bentoml.Service
    API (with runners) is now officially deprecated and is scheduled for removal in a future release. We recommend you use the
    @bentoml.service
    decorator. Note that: •
    1.4
    remains fully compatible with Bentos created by
    1.3
    . • The BentoML documentation has been updated with examples and guides for
    1.4
    . 🙏 As always, we appreciate your continued support!
    🎉 11
    🍱 4
    ✅ 2
    ❤️ 17
  • s

    Sean

    04/22/2025, 4:00 PM
    Hi Bento Community, We’re writing to make you aware of two recent security advisories involving unsafe deserialization of Python pickle data in BentoML that could enable remote‑code execution (RCE) when a request is sent with the
    Content‑Type: application/vnd.bentoml+pickle
    header. CVE‑2025‑27520: • Scope: Insecure pickle deserialization in the entry service • Affected versions: BentoML ≥ 1.3.4 and < 1.4.3 • Action: Upgrade to v1.4.3 or later. CVE‑2025‑32375: • Scope: Insecure pickle deserialization in dependent (runner) services • Affected versions: BentoML ≤ v1.4.8 • Exposure: Only when runners are launched explicitly with
    bentoml start-runner-server
    . ◦ Deployments started with standard
    bentoml serve
    and containerized via ◦
    bentoml containerize
    are not exposed, because runner ports are not published. ◦ As of v1.4.8, the
    start-runner-server
    sub‑command has been removed, fully closing this attack vector. • Action: Upgrade to v1.4.8 or later. Recommended next steps: 1. Upgrade immediately to the minimum safe version listed above (or any newer release). 2. Audit ingress rules to ensure only intended content types are accepted if pickle support is truly required; otherwise, consider disabling pickle inputs altogether. If you have questions or need assistance, please open an issue or reach out in our community Slack. Stay safe, The BentoML Team
    ❤️ 6
    👍 1