BentoML #gh

GitHub

06/06/2025, 3:33 AM

#5381 refactor: move legacy APIs to a separate module Pull request opened by frostming Signed-off-by: Frost Ming me@frostming.com ## What does this PR address? We plan to export the new service type in the top-level namespace, which conflict with the legacy service with the same name. Therefore, as a preliminary step, this PR moves all deprecated APIs to a new module

bentoml.legacy

while keeping the references in

bentoml

for a while. Any references to

bentoml.<legacy_api>

will emit a deprecation warning to let users migrate ASAP. bentoml/BentoML

GitHub

06/09/2025, 12:59 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by frostming

<https://github.com/bentoml/BentoML/commit/17a160cd059141852b390795f8f39da40deeda93|17a160cd>

- refactor: move legacy APIs to a separate module (#5381) bentoml/BentoML

GitHub

06/11/2025, 4:44 AM

#5382 feat: support custom service start command Pull request opened by frostming Signed-off-by: Frost Ming me@frostming.com ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

script has passed (instructions)? • Did you read through contribution guidelines and follow development guidelines? • Did your changes require updates to the documentation? Have you updated those accordingly? Here are documentation guidelines and tips on writting docs. • Did you write tests to cover your changes? bentoml/BentoML

GitHub

06/13/2025, 12:23 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by frostming

<https://github.com/bentoml/BentoML/commit/913549cc2595f690fcfd31025cfb1969f30dce7e|913549cc>

- feat: support custom service start command (#5382) bentoml/BentoML

GitHub

06/13/2025, 4:40 AM

#5383 fix: better way to set service name Pull request opened by frostming Signed-off-by: Frost Ming me@frostming.com ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/13/2025, 6:44 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by frostming

<https://github.com/bentoml/BentoML/commit/409a9d8829ab9f4ca02f1852bf21ec66b2ad82c4|409a9d88>

- fix: better way to set service name (#5383) bentoml/BentoML

GitHub

06/16/2025, 8:50 AM

Release - v1.4.16 New release published by frostming ## What's Changed • docs: Add codespaces video by @Sherlock113 in #5370 • ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #5378 • feat: read dependencies from uv lock by @frostming in #5379 • refactor: move legacy APIs to a separate module by @frostming in #5381 • feat: support custom service start command by @frostming in #5382 • fix: better way to set service name by @frostming in #5383 Full Changelog: v1.4.15...v1.4.16 bentoml/BentoML

GitHub

06/17/2025, 10:51 AM

#5384 chore(config): export accelerator literal type Pull request opened by aarnphm This PR exposes the internal literal types to be then used externally, with the likes of BentoVLLM, BentoSGLang Signed-off-by: Aaron Pham contact@aarnphm.xyz bentoml/BentoML

GitHub

06/17/2025, 9:59 PM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by aarnphm

<https://github.com/bentoml/BentoML/commit/dce5d3adbb79dd9f5beafdc21b11557f1db911d4|dce5d3ad>

- chore(config): export accelerator literal type (#5384) bentoml/BentoML

GitHub

06/19/2025, 1:28 AM

#5385 fix: accept bento type as the bento argument for deployment APIs Pull request opened by frostming Signed-off-by: Frost Ming me@frostming.com ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/19/2025, 5:09 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by aarnphm

<https://github.com/bentoml/BentoML/commit/0fc57118d2a3b6bc9de8c286fe71cab0d97de9b1|0fc57118>

- fix: accept bento type as the bento argument for deployment APIs (#5385) bentoml/BentoML

GitHub

06/19/2025, 3:22 PM

#5386 bug: The prometheus format output is not standard Issue created by Spoutnik97 ### Describe the bug I am trying to scrape the BentoMl /metrics route with fluent-bit. Fluent bit prometheus_scrape input throw an error :

[2025/06/19 14:19:35] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format

The issues seems to come from the order of histogram metrics. All the _sum keys are at the begginning of the metric, then the _buckets and _count ### To reproduce 1. Deploy a basic Bentoml container with metrics enabled 2. Install fluent-bit (

brew install fluent-bit

on macos) 3. Create a basic configuration:

fluent-bit.conf

Copy code

[SERVICE]
    Flush         2
    Log_level     debug
    Daemon        off
    HTTP_Server   on
    HTTP_Listen   0.0.0.0
    HTTP_PORT     2020

[INPUT]
    Name                  prometheus_scrape
    Tag                   local_metrics
    Scrape_interval       2s
    Host                  localhost
    Port                  8080
    Metrics_path          /test-metrics.txt

[OUTPUT]
    Name                  stdout
    Match                 *
    Format                json_lines

1. create a test-metrics.txt file with the content of the metrics below 2. launch a basic http server

python3 -m http.server 8080

3. launch fluent-bit :

fluent-bit -c fluent-bit.conf

Content of the test-metrics.txt file working : # HELP prediction_time_seconds Time taken for predictions # TYPE prediction_time_seconds histogram prediction_time_seconds_sum{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs"} 56.312395095825195 prediction_time_seconds_sum{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs"} 2.419936180114746 prediction_time_seconds_sum{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs"} 0.5229167938232422 prediction_time_seconds_sum{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs"} 4.157390356063843 prediction_time_seconds_sum{company_id="62b6fe8f-6dce-407c-9e6b-8c588a2d9501",endpoint="predict_process_collection_and_costs"} 8.153648376464844 prediction_time_seconds_sum{company_id="1b700fec-6f92-484e-8243-7cb1a47e7afc",endpoint="predict_process_collection_and_costs"} 0.32573604583740234 prediction_time_seconds_sum{company_id="e3874ca4-3ea0-46d7-8e8c-359065b0fab9",endpoint="predict_process_collection_and_costs"} 1.031454086303711 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="0.1"} 0.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="0.5"} 219.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="1.0"} 220.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="2.0"} 220.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="5.0"} 220.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="10.0"} 220.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="30.0"} 220.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="60.0"} 220.0 prediction_time_seconds_bucket{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs",le="+Inf"} 220.0 prediction_time_seconds_count{company_id="96a16b00-d289-45e6-856c-b45d7b83a09d",endpoint="predict_process_collection_and_costs"} 220.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="0.1"} 0.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="0.5"} 8.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="1.0"} 8.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="2.0"} 8.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="5.0"} 8.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="10.0"} 8.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="30.0"} 8.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="60.0"} 8.0 prediction_time_seconds_bucket{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs",le="+Inf"} 8.0 prediction_time_seconds_count{company_id="be545906-1849-4c10-a331-6fffc88aa3ba",endpoint="predict_process_collection_and_costs"} 8.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="0.1"} 0.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="0.5"} 2.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="1.0"} 2.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="2.0"} 2.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="5.0"} 2.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="10.0"} 2.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="30.0"} 2.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="60.0"} 2.0 prediction_time_seconds_bucket{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs",le="+Inf"} 2.0 prediction_time_seconds_count{company_id="c0cc5509-1249-4b1a-958b-af1dac4af697",endpoint="predict_process_collection_and_costs"} 2.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="0.1"} 0.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="0.5"} 15.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="1.0"} 15.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="2.0"} 15.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="5.0"} 15.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="10.0"} 15.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="30.0"} 15.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoint="predict_process_collection_and_costs",le="60.0"} 15.0 prediction_time_seconds_bucket{company_id="1a3dd9b6-ba28-408d-aa7f-bb27e2d00f46",endpoin… bentoml/BentoML

GitHub

06/19/2025, 8:37 PM

#5387 chore: return early python_packages Pull request opened by aarnphm This PR aims to return the instance early from self.python_packages Signed-off-by: Aaron Pham contact@aarnphm.xyz bentoml/BentoML

GitHub

06/20/2025, 12:23 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by aarnphm

<https://github.com/bentoml/BentoML/commit/d606ffc2ba37a895e66e65d65165f7b21201d97f|d606ffc2>

- chore: return early python_packages (#5387) bentoml/BentoML

GitHub

06/23/2025, 7:43 AM

#5388 docs: Update adaptive batching example Pull request opened by Sherlock113 ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/23/2025, 7:52 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by Sherlock113

<https://github.com/bentoml/BentoML/commit/66ec5cfe430e063cdf208af87bed7bec18fe7fec|66ec5cfe>

- docs: Update adaptive batching example (#5388) bentoml/BentoML

GitHub

06/23/2025, 9:42 AM

#5389 feat: reading bento args from a YAML file Pull request opened by frostming Signed-off-by: Frost Ming me@frostming.com ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/23/2025, 3:27 PM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by aarnphm

<https://github.com/bentoml/BentoML/commit/da137d23c98016941f2566d7ff6268e787f646e7|da137d23>

- feat: reading bento args from a YAML file (#5389) bentoml/BentoML

GitHub

06/24/2025, 6:03 AM

#5390 docs: Add canary deployment Pull request opened by Sherlock113 ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/24/2025, 6:38 AM

#5391 docs: Add --arg-file flag Pull request opened by Sherlock113 ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/24/2025, 7:33 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by Sherlock113

<https://github.com/bentoml/BentoML/commit/94a3728d1db5b66f4e2231dec3608d8ddec0f2e4|94a3728d>

- docs: Add --arg-file flag (#5391) bentoml/BentoML

GitHub

06/24/2025, 11:25 AM

#5392 fix: Adjust keras version comparison Pull request opened by rmarquis ## What does this PR address? Fix keras framework version comparison. A Keras 3 compatibility fix has been introduced previously in #4922. However, the version check that was added is sloppy: a string comparison is done lexicographically, not numerically, which make it breaks with the Keras version 3.10.0 release last May. This introduce proper version comparison. Fixes #4922 ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/24/2025, 9:33 PM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by aarnphm

<https://github.com/bentoml/BentoML/commit/1b81ddd7da6b41fb3466a2f80ce0aa8cea6bb251|1b81ddd7>

- fix: Adjust keras version comparison (#5392) bentoml/BentoML

GitHub

06/25/2025, 2:49 PM

#5393 feature: Integrate BentoML service with FastAPI Router Issue created by mbignotti ### Feature request Currently, it is possible to bind a BentoML service to a FastAPI app with the following code: from fastapi import FastAPI, Depends import bentoml app = FastAPI() @bentoml.service @bentoml.asgi_app(app, path="/v1") class MyService: name = "MyService" @app.get('/hello') def hello(self): # Inside service class, use

self

to access the service return f"Hello {self.name}" However, bigger FastAPI applications can be modularized into multiple routers, as described here. I'm not sure if it's possible, but It would be nice to bind a BentoML service to a specific router. Something like: from fastapi import APIRouter, Depends, HTTPException from my_app.auth import get_token_header router = APIRouter( prefix="/inference", tags=["inference"], dependencies=[Depends(get_token_header)], ) @bentoml.service @bentoml.asgi_app_router class MyService: name = "MyService" @router.get('/hello') def hello(self): # Inside service class, use

self

to access the service return f"Hello {self.name}" Then, you add it to the main app like this: from fastapi import Depends, FastAPI from my_app.routers import inference app = FastAPI() app.include_router(inference.router) Thanks! ### Motivation With

@bentoml.asgi_app

it is currently possible to integrate BentoML with ASGI applications. However, it is difficult to modularize the code and bind BentoML services to specific routers. ### Other No response bentoml/BentoML

GitHub

06/25/2025, 11:22 PM

#5394 chore: update AWS cloudformation template Pull request opened by sauyon bentoml/BentoML

GitHub

06/25/2025, 11:23 PM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by sauyon

<https://github.com/bentoml/BentoML/commit/ad0d7142db572a65c1af1651690a4141f15908ab|ad0d7142>

- chore: update AWS cloudformation template (#5394) bentoml/BentoML

GitHub

06/26/2025, 7:07 AM

#5395 feat: support canary for bentoml deployment apply commands Pull request opened by xianml ## What does this PR address? bentoml deployment apply -f deployment-jtest.yaml

Copy code

bento: jtest:ums23fazbcsawiru
name: jtest-lp3b
access_authorization: false
secrets: []
envs: []
services:
    Jtest:
        instance_type: cpu.small
        envs: []
        scaling:
            min_replicas: 0
            max_replicas: 1
            policy:
                scale_up_stabilization_window: 0
                scale_down_stabilization_window: 600
        config_overrides:
            traffic:
                timeout: 60
                external_queue: false
        deployment_strategy: RollingUpdate
    A:
        instance_type: cpu.small
        envs: []
        scaling:
            min_replicas: 0
            max_replicas: 1
            policy:
                scale_up_stabilization_window: 0
                scale_down_stabilization_window: 600
        config_overrides:
            traffic:
                timeout: 60
                external_queue: false
        deployment_strategy: RollingUpdate
cluster: default
canary:
  route_type: header
  route_by: X-Header
  versions:
      A:
        bento: jtest:fglup3qfm6hseiru
        weight: 50
        services:
          Jtest:
              instance_type: cpu.small
              envs: []
              scaling:
                  min_replicas: 1
                  max_replicas: 1
                  policy:
                      scale_up_stabilization_window: 0
                      scale_down_stabilization_window: 60
              config_overrides:
                  traffic:
                      timeout: 60
                      external_queue: false
              deployment_strategy: RollingUpdate
          A:
              instance_type: cpu.small
              envs: []
              scaling:
                  min_replicas: 1
                  max_replicas: 1
                  policy:
                      scale_up_stabilization_window: 0
                      scale_down_stabilization_window: 60
              config_overrides:
                  traffic:
                      timeout: 60
                      external_queue: false
              deployment_strategy: RollingUpdate

Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

script has passed (instructions)? • Did you read through contribution guidelines and follow development guidelines? • [] Did your changes require updates to the documentation? Have you updated those accordingly? Here are documentation guidelines and tips on writting docs. • Did you write tests to cover your changes? bentoml/BentoML

GitHub

06/26/2025, 9:31 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by aarnphm

<https://github.com/bentoml/BentoML/commit/b3f2a1cbc7ba52a0d9a2caa58193d8a12ec661a3|b3f2a1cb>

- docs: Add canary deployment (#5390) bentoml/BentoML

GitHub

06/26/2025, 1:35 PM

#5396 docs: Add BentoML Sandboxes Pull request opened by Sherlock113 ## What does this PR address? Fixes #(issue) ## Before submitting: • Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request. • Does the code follow BentoML's code style,

pre-commit run -a

GitHub

06/27/2025, 7:00 AM

1 new commit pushed to

<https://github.com/bentoml/BentoML/tree/main|main>

by Sherlock113

<https://github.com/bentoml/BentoML/commit/ac41a6ff35ef5aa768ee22e31f671804db3447aa|ac41a6ff>

- docs: Add BentoML Sandboxes (#5396) bentoml/BentoML