BentoML #ask-for-help

Matěj Šmíd

04/30/2025, 7:26 PM

I advanced a bit further. I have issues with packages with compiled extensions now. The packages are mainly git repos. The UV seems to make source tarballs out of the git repos and just installs the python sources without building. The wheel directory seems no longer supported. How to proceed with the packages with compiled extensions?

Liu Muzhou

05/07/2025, 1:25 AM

Hi, I just want to know if the https://github.com/bentoml/BentoSGLang example can support auto scaling?

Kevin Cui (Black-Hole)

05/07/2025, 3:47 AM

How does BentoML charge after deployment? Does the billing start as soon as it is deployed (for GPU and CPU), or is there no charge if it is not in use after deployment, with billing only starting when GPU or CPU resources are utilized?

Vincent Lu

05/07/2025, 4:03 AM

I added input and output nodes to my comfy workflow. After that I tried deploying it. But then I got an error. Where do I find that the pattern mismatch?

Vincent Lu

05/07/2025, 4:04 AM

Screenshot 2025-05-07 at 12.02.25 AM.png

Chris

05/07/2025, 5:27 AM

Hello! I set up an simple image service with "@service" and "@api" decorators. Is there a possibility during bentoml serve, to get the current bento tag of the service? i want to store a result.json file and also but the service tag in it 🙂

Kevin Cui (Black-Hole)

05/07/2025, 1:31 PM

Is it expected behavior that

bentoml.models.HuggingFaceModel

currently does not support setting

repo_type

? In our scenario, we need to use the

lukbl/LaTeX-OCR

model (

repo_type="space"

), but currently, there is no way to modify it. My current approach is to manually upload it using

bentoml models push

so that the service can access it.

Kevin Cui (Black-Hole)

05/08/2025, 6:36 AM

I noticed a strange issue: when my instance count is 0, the first request I send always returns “input parameter is missing.” However, once there is at least one instance, this problem does not occur. After the first request is sent, the number of instances will expand (from 0 to 1). The second request is sent immediately after the completion of the first request (at this point, the instance has not scaled down and there is still one instance).

Arnault Chazareix

05/09/2025, 2:48 PM

Hi 🙂 I am seeing that bentoml serve asks for a built bento rather than a path or other such inputs

Copy code

BENTO is the serving target, it can be the import as:
- the import path of a 'bentoml.Service' instance
- a tag to a Bento in local Bento store
- a folder containing a valid 'bentofile.yaml' build file with a 'service' field, which provides the import path of a 'bentoml.Service' instance
- a path to a built Bento (for internal & debug use only)

Serve from a bentoml.Service instance source code (for development use only): 'bentoml serve fraud_detector.py:svc'

What is the risk of serving from an import path to a bentoml.Service when the app is properly containerized in a dockerfile ? Thanks for your help

Jonathan Markland

05/12/2025, 4:32 PM

Hey guys 👋 I'm using the README here to install yatai locally. I was able to install yatai however when installing the yatai-image-builder i keep hitting the error in the screenshot Could i have some help please? Thank you

k1nd0ne

05/13/2025, 4:20 PM

Hello ! I am beginning to use openllm to benchmark multiple LLMs. I have x4 24GB nvidia GPUs. When executing openllm hello by default, all of the GPUs are detected but only one is used to determine (and run) what models are compatible. Looked online but didn't find a proper working way to pass multiple GPU to vllm backend. Kinda stuck here if anyone can help 🙏 Works well when using vllm backend standalone.

Rajiv Abraham

05/16/2025, 12:43 AM

Hi, BentoML is a cool idea. Thanks for making it! I just wanted to understand this sentence better "The

bentoml.importing()

context manager is used to handle import statements for dependencies required during serving but may not be available in other situations." , ref: https://docs.bentoml.com/en/latest/get-started/hello-world.html I'm not clear what 'but may not be available in other situations" means

Rajiv Abraham

05/16/2025, 1:06 AM

Also Is there any way of passing in a context object in the constructor or any other hooks to the class. The idea is to use this for as key object store and also to provide common abstractions to the object

Copy code

@bentoml.service
class Summarization:
    def __init__(self, context) -> None:
        self.context = context # <=====================
        self.model = ... model ...
        pass

    @bentoml.api
    def summarize(self, text: str = EXAMPLE_INPUT) -> str:
        logger = self.context.logger() # <===== logger is just an example. It could be a logger which is different depending on local, prod
        logger.info("In Summarize")
        return self.model.predict()

The idea is construct common object like monitoring, logging, feature stores through this context object and to pass it to the

Summarization

Service

Zuyang Liu

05/16/2025, 11:24 PM

Is it possible to support HEIC format when using image inputs? Usually this is done with

pillow_heif

and

register_heif_opener()

, but even when doing that, we are still getting:

Copy code

Traceback (most recent call last):
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/server/app.py", line 640, in api_endpoint_wrapper
    resp = await self.api_endpoint(name, request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/server/app.py", line 704, in api_endpoint
    input_data = await method.input_spec.from_http_request(request, serde)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_sdk/io_models.py", line 213, in from_http_request
    return await serde.parse_request(request, t.cast(t.Type[IODescriptor], cls))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/serde.py", line 227, in parse_request
    return cls.model_validate(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/pydantic/main.py", line 703, in model_validate
    return cls.__pydantic_validator__.validate_python(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_sdk/validators.py", line 70, in decode
    return PILImage.open(obj.file, formats=formats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/PIL/Image.py", line 3551, in open
    im = _open_core(fp, filename, prefix, formats)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/PIL/Image.py", line 3533, in _open_core
    factory, accept = OPEN[i]
                      ~~~~^^^
KeyError: 'HEIC'

Mohamed Meftah

05/20/2025, 6:38 PM

hello, I'm kind of confused. I assumed the reason for saving the model to BentoML and then pushing it is that I can load it from my services with the tag, but it seems that's always returning

None

. Do I need to bundle the models in my bento? My workflow is that I have a script for saving the models. I run that and push them to BentoML Cloud. If I do bentoml

serve

, it works locally, but when I push the service, importing the model with

BentoModel("tag")

fails, returning None.

Amit Gelber

05/21/2025, 1:13 PM

Hi Bentoml team! , is it possible to perform actions after return a response? for example: switch to another model in memory not in the middle of the response handling

Pierre Buyle

05/21/2025, 3:36 PM

What's the

_result_store

( a

Sqlite3Store

) on a

ServiceAppFactory

? Or more generally, why is a BentoML storing data locally in a sqlite database ? Is this needed to run a service, can we disable it ?

Dan Fairs

05/22/2025, 10:29 AM

Hi. I'm struggling to build a Bento which is in a monorepo structure. We have a Python tree like this:

Copy code

py/
  model-1/
    pyproject.toml
  model-2/
    pyproject.toml
  common/
    pyproject.toml

I'm struggling to figure out how to get this to work, for

model-1

to depend on

common

when

common

is in the same repo. I've tried with

uv add ../common

, and with creating a symlink from inside

model-1

common

, eg.

ln -s ../common common

. I've also added

include = ["common/"]

[tool.bentoml.build]

as per https://docs.bentoml.com/en/latest/reference/bentoml/bento-build-options.html#include. What's the correct recipe here? Thanks!

👀 1

Mohamed Meftah

05/22/2025, 2:07 PM

when I am trying to push the bento i'm getting this error `[bentos]

push

failed: request failed with status code 400: {"error":"model size limit reached, size: 73349Mi, limit: 32Gi"}` this is the service i'm building

Copy code

@bentoml.service(image=image, resources={"gpu": 1})
class MultiView:
    DEVICE = "cuda"
    DTYPE = torch.float16
    NUM_VIEWS = 6

    BASE_MODEL = HuggingFaceModel("stabilityai/stable-diffusion-xl-base-1.0")
    VAE_MODEL = HuggingFaceModel("madebyollin/sdxl-vae-fp16-fix")
    ADAPTER_MODEL = HuggingFaceModel("huanngzh/mv-adapter")
    BIREFNET_MODEL = HuggingFaceModel("ZhengPeng7/BiRefNet")
....

is there a way to by pass that?

Joseph Obeid

05/22/2025, 9:54 PM

Hi all, We have a BentoML service with an endpoint that submits a long-running Bento task, along with an async function that periodically updates a database. The issue we're running into is that when Bento scales down, the task continues running (as seen in the logs), but we get the below repeating error and the database stops receiving updates — leading to inconsistent behavior. We found that increasing the

scale_down_stabilization_window

to longer than the task’s max duration (~1920s) seems to prevent the issue. However, setting this parameter under the scaling policy in a YAML or JSON file appears to be ignored when deploying via the CLI — it always defaults to 600s. We can change it manually through the UI, but we need this setting to be configured automatically as part of our CI/CD pipeline. Here's the error that occurs when bento scales down in the middle of a task:

Copy code

2025-05-22T21:06:52Z [Service: Algorithm][Replica: 92cdv]
[ERROR] [cli] Exception in callback <bound method Arbiter.manage_watchers of <bentoml._internal.utils.circus.Arbiter object at 0x7f0cbdfbcbd0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/tornado/ioloop.py", line 945, in _run
    val = self.callback()
  File "/usr/local/lib/python3.11/site-packages/circus/util.py", line 1038, in wrapper
    raise ConflictError("arbiter is already running arbiter_stop command")
circus.exc.ConflictError: arbiter is already running arbiter_stop command

We’re on BentoML version 1.3.14. Is this a known issue? Is there a workaround to ensure

scale_down_stabilization_window

is applied automatically via the CLI? Thanks!

Jonathan Markland

05/23/2025, 12:21 PM

Hey all, When is Yatai 2.0 going live?

Mattia Bradascio

06/03/2025, 2:51 PM

Hey! What is the best way to deploy BentoML on Kubernetes? I saw Yatai offers an operator, but as Jonathan mentioned above, it seems like 2.0 still isn't available.

Toke Emil Heldbo Reines

06/05/2025, 5:30 AM

The simplest API that expects an image and a string fails when provided a string with a number in it. Sample API:

Copy code

@bentoml.service
class Service:
    @bentoml.api
    def classify(self, input_image: PILImage.Image, uid: str) -> Any:
        print(uid)

Call it in the swagger docs with any number and it fails. Call it with curl with the uid being an explicit string and it still fails.

Copy code

curl -X 'POST' \
  '<http://localhost:3000/classify>' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'input_image=@sample_image.png;type=image/png' \
  -F 'uid="1231231231232131313212312312321123123213312";type=application/json'

How do I fix that so it sees it as an actual string in all cases, no typecasting cutting off decimals etc?

Noah

06/06/2025, 2:05 PM

Hey! I was just invited to an organization on BentoCloud by a coworker. I clicked the link and signed up for an account and was guided through the steps. I chose to authenticate my account with creating an API token on the web browser and it made me create an organization etc. How do I access the organization which I was invited to? I can't seem to find it or the associated members and ML models hosted on the platform

xiongfeng

06/12/2025, 8:38 AM

Hey! The acceleration effect shown in this blog (https://www.bentoml.com/blog/cold-starting-llms-on-kubernetes-in-under-30-seconds) is very exciting. I am very interested in the direct writing to GPU introduced in "Step 3: Load models directly into GPU memory". Where can I see further introduction?

👀 1

Jabali

06/16/2025, 4:48 PM

I'm suddenly unable to deploy on any hardware at all on the bentoml cloud. My quota seems to have gone down to 0 for every GPU and CPU and I can't figure out why. Because of this I also can't make any updates to my existing deployment

✅ 1

Phirum Peang

06/16/2025, 5:20 PM

Where can I override these setting for the openllm chat ui: Service Configuration This model was configured with the following settings:

Copy code

{
  "enable_auto_tool_choice": true,
  "max_model_len": 3192,
  "tensor_parallel_size": 1,
  "tool_call_parser": "llama3_json"
}

I want to change the max_model_len to a higher number. I don't know where the configuration file is located.

Jeff Spurlock

06/20/2025, 5:06 PM

Hello, I'm trying to get started, but when I run

bentoml cloud login

and I get prompted for token, if I create a new one, I get an axios error in the browser, and while the token does create, my terminal stays in the 'waiting for authentication...' state. note that it does actually create the token in the admin panel. So If I cancel this login command, run it again and say I want to paste in an existing token, there doesn't seem to be a way in the admin panel to fetch the token value I just created so I can manually paste it into the terminal

Remy

06/25/2025, 8:31 AM

Hello. I'm using BentoML with the Keras framework (trhough TensorFlow). When using it, I get this warning: `

Copy code

BentoMLDeprecationWarning: `bentoml.keras` is deprecated since v1.4 and will be removed in a future version.

I couldn't find any information about Keras deprecation, and as far as I remember, this warning has been showing since Bento v1.3 a few months ago. Is Keras support going to be effectively removed? For reference, Keras in BentoML doc: https://docs.bentoml.com/en/latest/reference/bentoml/frameworks/keras.html (without anything about deprecation)

Rehan Shah

06/25/2025, 9:13 AM

Hi! I'm planning to use BentoML to deploy an inpainting model ( the Flux Fill Dev variant) but wasn’t able to find any documentation for it. I checked the FLUX 1 documentation as well, but nothing seemed relevant. Is there a preset pipeline for deploying Flux Fill Dev, or would I need to build a custom one?