Matěj Šmíd
04/30/2025, 7:26 PMLiu Muzhou
05/07/2025, 1:25 AMKevin Cui (Black-Hole)
05/07/2025, 3:47 AMVincent Lu
05/07/2025, 4:03 AMVincent Lu
05/07/2025, 4:04 AMChris
05/07/2025, 5:27 AMKevin Cui (Black-Hole)
05/07/2025, 1:31 PMbentoml.models.HuggingFaceModel
currently does not support setting repo_type
?
In our scenario, we need to use the lukbl/LaTeX-OCR
model (repo_type="space"
), but currently, there is no way to modify it. My current approach is to manually upload it using bentoml models push
so that the service can access it.Kevin Cui (Black-Hole)
05/08/2025, 6:36 AMArnault Chazareix
05/09/2025, 2:48 PMBENTO is the serving target, it can be the import as:
- the import path of a 'bentoml.Service' instance
- a tag to a Bento in local Bento store
- a folder containing a valid 'bentofile.yaml' build file with a 'service' field, which provides the import path of a 'bentoml.Service' instance
- a path to a built Bento (for internal & debug use only)
Serve from a bentoml.Service instance source code (for development use only): 'bentoml serve fraud_detector.py:svc'
What is the risk of serving from an import path to a bentoml.Service when the app is properly containerized in a dockerfile ?
Thanks for your helpJonathan Markland
05/12/2025, 4:32 PMk1nd0ne
05/13/2025, 4:20 PMRajiv Abraham
05/16/2025, 12:43 AMbentoml.importing()
context manager is used to handle import statements for dependencies required during serving but may not be available in other situations." , ref: https://docs.bentoml.com/en/latest/get-started/hello-world.html
I'm not clear what 'but may not be available in other situations" meansRajiv Abraham
05/16/2025, 1:06 AM@bentoml.service
class Summarization:
def __init__(self, context) -> None:
self.context = context # <=====================
self.model = ... model ...
pass
@bentoml.api
def summarize(self, text: str = EXAMPLE_INPUT) -> str:
logger = self.context.logger() # <===== logger is just an example. It could be a logger which is different depending on local, prod
logger.info("In Summarize")
return self.model.predict()
The idea is construct common object like monitoring, logging, feature stores through this context object and to pass it to the Summarization
ServiceZuyang Liu
05/16/2025, 11:24 PMpillow_heif
and register_heif_opener()
, but even when doing that, we are still getting:
Traceback (most recent call last):
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/server/app.py", line 640, in api_endpoint_wrapper
resp = await self.api_endpoint(name, request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/server/app.py", line 704, in api_endpoint
input_data = await method.input_spec.from_http_request(request, serde)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_sdk/io_models.py", line 213, in from_http_request
return await serde.parse_request(request, t.cast(t.Type[IODescriptor], cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_impl/serde.py", line 227, in parse_request
return cls.model_validate(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/pydantic/main.py", line 703, in model_validate
return cls.__pydantic_validator__.validate_python(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/_bentoml_sdk/validators.py", line 70, in decode
return PILImage.open(obj.file, formats=formats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/PIL/Image.py", line 3551, in open
im = _open_core(fp, filename, prefix, formats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zuyang/Documents/mlops/BentoML/ImageStagePredict/.venv/lib/python3.12/site-packages/PIL/Image.py", line 3533, in _open_core
factory, accept = OPEN[i]
~~~~^^^
KeyError: 'HEIC'
Mohamed Meftah
05/20/2025, 6:38 PMNone
. Do I need to bundle the models in my bento?
My workflow is that I have a script for saving the models. I run that and push them to BentoML Cloud. If I do bentoml serve
, it works locally, but when I push the service, importing the model with BentoModel("tag")
fails, returning None.Amit Gelber
05/21/2025, 1:13 PMPierre Buyle
05/21/2025, 3:36 PM_result_store
( a Sqlite3Store
) on a ServiceAppFactory
?
Or more generally, why is a BentoML storing data locally in a sqlite database ? Is this needed to run a service, can we disable it ?Dan Fairs
05/22/2025, 10:29 AMpy/
model-1/
pyproject.toml
model-2/
pyproject.toml
common/
pyproject.toml
I'm struggling to figure out how to get this to work, for model-1
to depend on common
when common
is in the same repo. I've tried with uv add ../common
, and with creating a symlink from inside model-1
to common
, eg. ln -s ../common common
. I've also added include = ["common/"]
to [tool.bentoml.build]
as per https://docs.bentoml.com/en/latest/reference/bentoml/bento-build-options.html#include. What's the correct recipe here? Thanks!Mohamed Meftah
05/22/2025, 2:07 PMpush
failed: request failed with status code 400: {"error":"model size limit reached, size: 73349Mi, limit: 32Gi"}`
this is the service i'm building
@bentoml.service(image=image, resources={"gpu": 1})
class MultiView:
DEVICE = "cuda"
DTYPE = torch.float16
NUM_VIEWS = 6
BASE_MODEL = HuggingFaceModel("stabilityai/stable-diffusion-xl-base-1.0")
VAE_MODEL = HuggingFaceModel("madebyollin/sdxl-vae-fp16-fix")
ADAPTER_MODEL = HuggingFaceModel("huanngzh/mv-adapter")
BIREFNET_MODEL = HuggingFaceModel("ZhengPeng7/BiRefNet")
....
is there a way to by pass that?Joseph Obeid
05/22/2025, 9:54 PMscale_down_stabilization_window
to longer than the task’s max duration (~1920s) seems to prevent the issue. However, setting this parameter under the scaling policy in a YAML or JSON file appears to be ignored when deploying via the CLI — it always defaults to 600s. We can change it manually through the UI, but we need this setting to be configured automatically as part of our CI/CD pipeline.
Here's the error that occurs when bento scales down in the middle of a task:
2025-05-22T21:06:52Z [Service: Algorithm][Replica: 92cdv]
[ERROR] [cli] Exception in callback <bound method Arbiter.manage_watchers of <bentoml._internal.utils.circus.Arbiter object at 0x7f0cbdfbcbd0>>
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/tornado/ioloop.py", line 945, in _run
val = self.callback()
File "/usr/local/lib/python3.11/site-packages/circus/util.py", line 1038, in wrapper
raise ConflictError("arbiter is already running arbiter_stop command")
circus.exc.ConflictError: arbiter is already running arbiter_stop command
We’re on BentoML version 1.3.14.
Is this a known issue? Is there a workaround to ensure
scale_down_stabilization_window
is applied automatically via the CLI?
Thanks!Jonathan Markland
05/23/2025, 12:21 PMMattia Bradascio
06/03/2025, 2:51 PMToke Emil Heldbo Reines
06/05/2025, 5:30 AM@bentoml.service
class Service:
@bentoml.api
def classify(self, input_image: PILImage.Image, uid: str) -> Any:
print(uid)
Call it in the swagger docs with any number and it fails. Call it with curl with the uid being an explicit string and it still fails.
curl -X 'POST' \
'<http://localhost:3000/classify>' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'input_image=@sample_image.png;type=image/png' \
-F 'uid="1231231231232131313212312312321123123213312";type=application/json'
How do I fix that so it sees it as an actual string in all cases, no typecasting cutting off decimals etc?Noah
06/06/2025, 2:05 PMxiongfeng
06/12/2025, 8:38 AMJabali
06/16/2025, 4:48 PMPhirum Peang
06/16/2025, 5:20 PM{
"enable_auto_tool_choice": true,
"max_model_len": 3192,
"tensor_parallel_size": 1,
"tool_call_parser": "llama3_json"
}
I want to change the max_model_len to a higher number. I don't know where the configuration file is located.Jeff Spurlock
06/20/2025, 5:06 PMbentoml cloud login
and I get prompted for token, if I create a new one, I get an axios error in the browser, and while the token does create, my terminal stays in the 'waiting for authentication...' state. note that it does actually create the token in the admin panel. So If I cancel this login command, run it again and say I want to paste in an existing token, there doesn't seem to be a way in the admin panel to fetch the token value I just created so I can manually paste it into the terminalRemy
06/25/2025, 8:31 AMBentoMLDeprecationWarning: `bentoml.keras` is deprecated since v1.4 and will be removed in a future version.
I couldn't find any information about Keras deprecation, and as far as I remember, this warning has been showing since Bento v1.3 a few months ago. Is Keras support going to be effectively removed?
For reference, Keras in BentoML doc: https://docs.bentoml.com/en/latest/reference/bentoml/frameworks/keras.html (without anything about deprecation)Rehan Shah
06/25/2025, 9:13 AM