https://bentoml.com logo
Join Slack
Powered by
# ask-for-help
  • m

    Matěj Šmíd

    04/30/2025, 7:26 PM
    I advanced a bit further. I have issues with packages with compiled extensions now. The packages are mainly git repos. The UV seems to make source tarballs out of the git repos and just installs the python sources without building. The wheel directory seems no longer supported. How to proceed with the packages with compiled extensions?
    j
    f
    • 3
    • 10
  • l

    Liu Muzhou

    05/07/2025, 1:25 AM
    Hi, I just want to know if the https://github.com/bentoml/BentoSGLang example can support auto scaling?
    c
    • 2
    • 2
  • k

    Kevin Cui (Black-Hole)

    05/07/2025, 3:47 AM
    How does BentoML charge after deployment? Does the billing start as soon as it is deployed (for GPU and CPU), or is there no charge if it is not in use after deployment, with billing only starting when GPU or CPU resources are utilized?
    c
    m
    • 3
    • 10
  • v

    Vincent Lu

    05/07/2025, 4:03 AM
    I added input and output nodes to my comfy workflow. After that I tried deploying it. But then I got an error. Where do I find that the pattern mismatch?
    c
    j
    • 3
    • 8
  • v

    Vincent Lu

    05/07/2025, 4:04 AM
    Screenshot 2025-05-07 at 12.02.25 AM.png
  • c

    Chris

    05/07/2025, 5:27 AM
    Hello! I set up an simple image service with "@service" and "@api" decorators. Is there a possibility during bentoml serve, to get the current bento tag of the service? i want to store a result.json file and also but the service tag in it 🙂
    c
    • 2
    • 3
  • k

    Kevin Cui (Black-Hole)

    05/07/2025, 1:31 PM
    Is it expected behavior that
    bentoml.models.HuggingFaceModel
    currently does not support setting
    repo_type
    ? In our scenario, we need to use the
    lukbl/LaTeX-OCR
    model (
    repo_type="space"
    ), but currently, there is no way to modify it. My current approach is to manually upload it using
    bentoml models push
    so that the service can access it.
    j
    x
    f
    • 4
    • 5
  • k

    Kevin Cui (Black-Hole)

    05/08/2025, 6:36 AM
    I noticed a strange issue: when my instance count is 0, the first request I send always returns “input parameter is missing.” However, once there is at least one instance, this problem does not occur. After the first request is sent, the number of instances will expand (from 0 to 1). The second request is sent immediately after the completion of the first request (at this point, the instance has not scaled down and there is still one instance).
    c
    f
    • 3
    • 7
  • a

    Arnault Chazareix

    05/09/2025, 2:48 PM
    Hi 🙂 I am seeing that bentoml serve asks for a built bento rather than a path or other such inputs
    Copy code
    BENTO is the serving target, it can be the import as:
    - the import path of a 'bentoml.Service' instance
    - a tag to a Bento in local Bento store
    - a folder containing a valid 'bentofile.yaml' build file with a 'service' field, which provides the import path of a 'bentoml.Service' instance
    - a path to a built Bento (for internal & debug use only)
    
    Serve from a bentoml.Service instance source code (for development use only): 'bentoml serve fraud_detector.py:svc'
    What is the risk of serving from an import path to a bentoml.Service when the app is properly containerized in a dockerfile ? Thanks for your help
    j
    • 2
    • 1
  • j

    Jonathan Markland

    05/12/2025, 4:32 PM
    Hey guys 👋 I'm using the README here to install yatai locally. I was able to install yatai however when installing the yatai-image-builder i keep hitting the error in the screenshot Could i have some help please? Thank you
    • 1
    • 1
  • k

    k1nd0ne

    05/13/2025, 4:20 PM
    Hello ! I am beginning to use openllm to benchmark multiple LLMs. I have x4 24GB nvidia GPUs. When executing openllm hello by default, all of the GPUs are detected but only one is used to determine (and run) what models are compatible. Looked online but didn't find a proper working way to pass multiple GPU to vllm backend. Kinda stuck here if anyone can help 🙏 Works well when using vllm backend standalone.
    j
    • 2
    • 1
  • r

    Rajiv Abraham

    05/16/2025, 12:43 AM
    Hi, BentoML is a cool idea. Thanks for making it! I just wanted to understand this sentence better "The
    bentoml.importing()
    context manager is used to handle import statements for dependencies required during serving but may not be available in other situations." , ref: https://docs.bentoml.com/en/latest/get-started/hello-world.html I'm not clear what 'but may not be available in other situations" means
  • r

    Rajiv Abraham

    05/16/2025, 1:06 AM
    Also Is there any way of passing in a context object in the constructor or any other hooks to the class. The idea is to use this for as key object store and also to provide common abstractions to the object
    Copy code
    @bentoml.service
    class Summarization:
        def __init__(self, context) -> None:
            self.context = context # <=====================
            self.model = ... model ...
            pass
    
        @bentoml.api
        def summarize(self, text: str = EXAMPLE_INPUT) -> str:
            logger = self.context.logger() # <===== logger is just an example. It could be a logger which is different depending on local, prod
            logger.info("In Summarize")
            return self.model.predict()
    The idea is construct common object like monitoring, logging, feature stores through this context object and to pass it to the
    Summarization
    Service
    f
    • 2
    • 3
  • z

    Zuyang Liu

    05/16/2025, 11:24 PM
    Is it possible to support HEIC format when using image inputs? Usually this is done with
    pillow_heif
    and
    register_heif_opener()
    , but even when doing that, we are still getting:
    Copy code
    Traceback (most recent call last):
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/server/app.py", line 640, in api_endpoint_wrapper
        resp = await self.api_endpoint(name, request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/server/app.py", line 704, in api_endpoint
        input_data = await method.input_spec.from_http_request(request, serde)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_sdk/io_models.py", line 213, in from_http_request
        return await serde.parse_request(request, t.cast(t.Type[IODescriptor], cls))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/serde.py", line 227, in parse_request
        return cls.model_validate(data)
               ^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/pydantic/main.py", line 703, in model_validate
        return cls.__pydantic_validator__.validate_python(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_sdk/validators.py", line 70, in decode
        return PILImage.open(obj.file, formats=formats)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/PIL/Image.py", line 3551, in open
        im = _open_core(fp, filename, prefix, formats)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/PIL/Image.py", line 3533, in _open_core
        factory, accept = OPEN[i]
                          ~~~~^^^
    KeyError: 'HEIC'
    j
    • 2
    • 2
  • m

    Mohamed Meftah

    05/20/2025, 6:38 PM
    hello, I'm kind of confused. I assumed the reason for saving the model to BentoML and then pushing it is that I can load it from my services with the tag, but it seems that's always returning
    None
    . Do I need to bundle the models in my bento? My workflow is that I have a script for saving the models. I run that and push them to BentoML Cloud. If I do bentoml
    serve
    , it works locally, but when I push the service, importing the model with
    BentoModel("tag")
    fails, returning None.
    j
    • 2
    • 6
  • a

    Amit Gelber

    05/21/2025, 1:13 PM
    Hi Bentoml team! , is it possible to perform actions after return a response? for example: switch to another model in memory not in the middle of the response handling
    • 1
    • 1
  • p

    Pierre Buyle

    05/21/2025, 3:36 PM
    What's the
    _result_store
    ( a
    Sqlite3Store
    ) on a
    ServiceAppFactory
    ? Or more generally, why is a BentoML storing data locally in a sqlite database ? Is this needed to run a service, can we disable it ?
    c
    j
    +2
    • 5
    • 10
  • d

    Dan Fairs

    05/22/2025, 10:29 AM
    Hi. I'm struggling to build a Bento which is in a monorepo structure. We have a Python tree like this:
    Copy code
    py/
      model-1/
        pyproject.toml
      model-2/
        pyproject.toml
      common/
        pyproject.toml
    I'm struggling to figure out how to get this to work, for
    model-1
    to depend on
    common
    when
    common
    is in the same repo. I've tried with
    uv add ../common
    , and with creating a symlink from inside
    model-1
    to
    common
    , eg.
    ln -s ../common common
    . I've also added
    include = ["common/"]
    to
    [tool.bentoml.build]
    as per https://docs.bentoml.com/en/latest/reference/bentoml/bento-build-options.html#include. What's the correct recipe here? Thanks!
    👀 1
    m
    j
    • 3
    • 6
  • m

    Mohamed Meftah

    05/22/2025, 2:07 PM
    when I am trying to push the bento i'm getting this error `[bentos]
    push
    failed: request failed with status code 400: {"error":"model size limit reached, size: 73349Mi, limit: 32Gi"}` this is the service i'm building
    Copy code
    @bentoml.service(image=image, resources={"gpu": 1})
    class MultiView:
        DEVICE = "cuda"
        DTYPE = torch.float16
        NUM_VIEWS = 6
    
        BASE_MODEL = HuggingFaceModel("stabilityai/stable-diffusion-xl-base-1.0")
        VAE_MODEL = HuggingFaceModel("madebyollin/sdxl-vae-fp16-fix")
        ADAPTER_MODEL = HuggingFaceModel("huanngzh/mv-adapter")
        BIREFNET_MODEL = HuggingFaceModel("ZhengPeng7/BiRefNet")
    ....
    is there a way to by pass that?
    t
    • 2
    • 1
  • j

    Joseph Obeid

    05/22/2025, 9:54 PM
    Hi all, We have a BentoML service with an endpoint that submits a long-running Bento task, along with an async function that periodically updates a database. The issue we're running into is that when Bento scales down, the task continues running (as seen in the logs), but we get the below repeating error and the database stops receiving updates — leading to inconsistent behavior. We found that increasing the
    scale_down_stabilization_window
    to longer than the task’s max duration (~1920s) seems to prevent the issue. However, setting this parameter under the scaling policy in a YAML or JSON file appears to be ignored when deploying via the CLI — it always defaults to 600s. We can change it manually through the UI, but we need this setting to be configured automatically as part of our CI/CD pipeline. Here's the error that occurs when bento scales down in the middle of a task:
    Copy code
    2025-05-22T21:06:52Z [Service: Algorithm][Replica: 92cdv]
    [ERROR] [cli] Exception in callback <bound method Arbiter.manage_watchers of <bentoml._internal.utils.circus.Arbiter object at 0x7f0cbdfbcbd0>>
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/tornado/ioloop.py", line 945, in _run
        val = self.callback()
      File "/usr/local/lib/python3.11/site-packages/circus/util.py", line 1038, in wrapper
        raise ConflictError("arbiter is already running arbiter_stop command")
    circus.exc.ConflictError: arbiter is already running arbiter_stop command
    We’re on BentoML version 1.3.14. Is this a known issue? Is there a workaround to ensure
    scale_down_stabilization_window
    is applied automatically via the CLI? Thanks!
    j
    • 2
    • 1
  • j

    Jonathan Markland

    05/23/2025, 12:21 PM
    Hey all, When is Yatai 2.0 going live?
    a
    • 2
    • 4
  • m

    Mattia Bradascio

    06/03/2025, 2:51 PM
    Hey! What is the best way to deploy BentoML on Kubernetes? I saw Yatai offers an operator, but as Jonathan mentioned above, it seems like 2.0 still isn't available.
    t
    • 2
    • 1
  • t

    Toke Emil Heldbo Reines

    06/05/2025, 5:30 AM
    The simplest API that expects an image and a string fails when provided a string with a number in it. Sample API:
    Copy code
    @bentoml.service
    class Service:
        @bentoml.api
        def classify(self, input_image: PILImage.Image, uid: str) -> Any:
            print(uid)
    Call it in the swagger docs with any number and it fails. Call it with curl with the uid being an explicit string and it still fails.
    Copy code
    curl -X 'POST' \
      '<http://localhost:3000/classify>' \
      -H 'accept: application/json' \
      -H 'Content-Type: multipart/form-data' \
      -F 'input_image=@sample_image.png;type=image/png' \
      -F 'uid="1231231231232131313212312312321123123213312";type=application/json'
    How do I fix that so it sees it as an actual string in all cases, no typecasting cutting off decimals etc?
    j
    f
    • 3
    • 8
  • n

    Noah

    06/06/2025, 2:05 PM
    Hey! I was just invited to an organization on BentoCloud by a coworker. I clicked the link and signed up for an account and was guided through the steps. I chose to authenticate my account with creating an API token on the web browser and it made me create an organization etc. How do I access the organization which I was invited to? I can't seem to find it or the associated members and ML models hosted on the platform
    • 1
    • 1
  • x

    xiongfeng

    06/12/2025, 8:38 AM
    Hey! The acceleration effect shown in this blog (https://www.bentoml.com/blog/cold-starting-llms-on-kubernetes-in-under-30-seconds) is very exciting. I am very interested in the direct writing to GPU introduced in "Step 3: Load models directly into GPU memory". Where can I see further introduction?
    👀 1
  • j

    Jabali

    06/16/2025, 4:48 PM
    I'm suddenly unable to deploy on any hardware at all on the bentoml cloud. My quota seems to have gone down to 0 for every GPU and CPU and I can't figure out why. Because of this I also can't make any updates to my existing deployment
    ✅ 1
  • p

    Phirum Peang

    06/16/2025, 5:20 PM
    Where can I override these setting for the openllm chat ui: Service Configuration This model was configured with the following settings:
    Copy code
    {
      "enable_auto_tool_choice": true,
      "max_model_len": 3192,
      "tensor_parallel_size": 1,
      "tool_call_parser": "llama3_json"
    }
    I want to change the max_model_len to a higher number. I don't know where the configuration file is located.
    j
    a
    • 3
    • 5
  • j

    Jeff Spurlock

    06/20/2025, 5:06 PM
    Hello, I'm trying to get started, but when I run
    bentoml cloud login
    and I get prompted for token, if I create a new one, I get an axios error in the browser, and while the token does create, my terminal stays in the 'waiting for authentication...' state. note that it does actually create the token in the admin panel. So If I cancel this login command, run it again and say I want to paste in an existing token, there doesn't seem to be a way in the admin panel to fetch the token value I just created so I can manually paste it into the terminal
    c
    • 2
    • 5
  • r

    Remy

    06/25/2025, 8:31 AM
    Hello. I'm using BentoML with the Keras framework (trhough TensorFlow). When using it, I get this warning: `
    Copy code
    BentoMLDeprecationWarning: `bentoml.keras` is deprecated since v1.4 and will be removed in a future version.
    I couldn't find any information about Keras deprecation, and as far as I remember, this warning has been showing since Bento v1.3 a few months ago. Is Keras support going to be effectively removed? For reference, Keras in BentoML doc: https://docs.bentoml.com/en/latest/reference/bentoml/frameworks/keras.html (without anything about deprecation)
    • 1
    • 1
  • r

    Rehan Shah

    06/25/2025, 9:13 AM
    Hi! I'm planning to use BentoML to deploy an inpainting model ( the Flux Fill Dev variant) but wasn’t able to find any documentation for it. I checked the FLUX 1 documentation as well, but nothing seemed relevant. Is there a preset pipeline for deploying Flux Fill Dev, or would I need to build a custom one?
    j
    • 2
    • 1