BentoML #ai-ml-everything

Arihant Hirawat

10/20/2024, 6:27 AM

Hi BentoML Community, We’ve developed an HTTP API boilerplate for serving ML models with BentoML. 🚀 Through our experience deploying multiple models as API services, we identified several common tasks that are frequently needed. To streamline the process, we decided to create an open-source boilerplate that simplifies deploying any ML model as an HTTP API server. This boilerplate comes with the following features included: • 📂 Project structure • 🔐 JWT authentication • ☁️ Model download from S3 • ✅ Unit tests • 📝 Structured logging • 📊 Monitoring • 🔗 DynamoDB integration • 🔄 CI/CD workflow for building and deploying • 🔒 Essential security measures for public API exposure We hope this helps others in the community by reducing setup time and improving efficiency when deploying models. Feel free to check it out and contribute! 🙌 Repo: https://github.com/infraspecdev/bentoml-template

🎉 6

👍 4

Sherlock Xu

10/23/2024, 12:39 AM

Hello everyone! If you want to create an LLM agent app, read our blog post to see how you can build one with LangGraph and BentoML https://www.bentoml.com/blog/deploying-a-langgraph-agent-application-with-an-open-source-model

Syed Sadath

10/31/2024, 2:45 PM

Hi just stepped into bentoml , My usecase is building rags . Any blog post that covers all the general features bentoml offers ?

Sherlock Xu

11/04/2024, 12:42 AM

Hi everyone! See this article to explore some top open-source embedding models https://www.bentoml.com/blog/a-guide-to-open-source-embedding-models

Akshat Sharma

11/16/2024, 7:14 AM

🚀 Exploring the Impact of AI & Cybersecurity on Digital Payment Systems 💳🤖 In today’s fast-evolving digital landscape, the integration of AI and robust cybersecurity is transforming how we think about digital payments. From enhanced fraud prevention to seamless transactions, these technologies are reshaping the future of financial systems. Check out my latest blog where I dive into the crucial role AI and cybersecurity play in securing digital payment ecosystems and ensuring a safer online transaction experience. 🔐💡 👉 https://medium.com/@akshat111111/the-impact-of-ai-and-cybersecurity-on-digital-payment-systems-3c93f1a2c35a

Ritabrata Maiti

11/18/2024, 6:27 AM

I’ve been working on AnyModal, a framework for integrating different data types (like images and audio) with LLMs. Existing tools felt too limited or task-specific, so I wanted something more flexible. AnyModal makes it easy to combine modalities with minimal setup—whether it’s LaTeX OCR, image captioning, or chest X-ray interpretation. You can plug in models like ViT for image inputs, project them into a token space for your LLM, and handle tasks like visual question answering or audio captioning. It’s still a work in progress, so feedback or contributions would be great. GitHub: https://github.com/ritabratamaiti/AnyModal

Sherlock Xu

11/19/2024, 12:10 PM

Hello everyone! Check out our blog post to build a multi-agent app with CrewAI and BentoML: https://www.bentoml.com/blog/building-a-multi-agent-system-with-crewai-and-bentoml

Sherlock Xu

11/25/2024, 1:14 PM

Hello everyone! Read our tutorial to serve AI21’s Jamba 1.5 Mini https://www.bentoml.com/blog/deploying-ai21-jamba-1-5-mini-with-bentoml! Note that you can also self-host it with OpenLLM!

Toke Emil Heldbo Reines

11/27/2024, 12:39 PM

How do you guys handle quality insurance of your models before shipping to production? We track metrics in mlflow and register promising models in the model registry. We then use those models in the bentoml service when building and containerizing. The build/containerization itself is handled by jenkins that builds+containerizes and pushes to AWS ECR. I'd like to have a QA step either before pushing, or before staging the containers for production someway. I imagine having some curated data samples with exact expected outputs and then some larger dataset with overall metrics like AUC, average precision etc where I expect the model to overall perform well - maybe even some warning mechanism showing where any new model performs worse or better than previous models. What have you guys made to handle this step or quality test the final built model/bento?

Sherlock Xu

12/03/2024, 4:54 AM

Hi everyone! See this case study https://www.bentoml.com/blog/neurolabs-faster-time-to-market-and-save-cost-with-bentoml to learn how BentoML helps Neurolabs accelerate its AI journey 🚀

Sherlock Xu

12/06/2024, 6:56 AM

Happy Friday everyone! Read our blog post to see how our new feature BentoML Codespaces solves challenges in developing AI applications and speed up your iteration cycle 🚀 https://www.bentoml.com/blog/accelerate-ai-application-development-with-bentoml-codespaces

Sherlock Xu

12/18/2024, 2:17 AM

Hi everyone! Are you working with ComfyUI workflows? 🚀 Convert them into production-ready APIs with our new project comfy-pack! Check out the full post to see how it works! https://www.bentoml.com/blog/comfy-pack-serving-comfyui-workflows-as-apis

❤️ 2

naga venkata satish kumar seethepalli

12/18/2024, 7:20 PM

Hi is there any time line for yatai 2.0

👀 3

Alex

12/19/2024, 12:47 PM

Hello, colleagues! I have a question about example with vllm serving: https://docs.bentoml.com/en/latest/examples/vllm.html Actually, I do not understand when this setup makes sense. vllm itself has a many features and advanced design e.g. batched inference, queue of requests etc. So, when wrapping it in bentoML is justified?

Sherlock Xu

01/03/2025, 1:59 AM

👋 Happy new year, everyone! Check out a curated list of popular ComfyUI custom nodes and answers to FAQs https://www.bentoml.com/blog/a-guide-to-comfyui-custom-nodes

Sherlock Xu

01/10/2025, 5:36 AM

Hi everyone! See our blog post with Twilio https://www.twilio.com/en-us/blog/voice-application-conversationrelay-bentoml and learn how you can build a voice AI application with ease 🚀

Sherlock Xu

01/16/2025, 8:49 AM

Hello everyone! See this blog post to learn about structured decoding in vLLM https://www.bentoml.com/blog/structured-decoding-in-vllm-a-gentle-introduction

Sherlock Xu

01/21/2025, 4:00 AM

Hi everyone! Check out our latest blog post to deploy ColPali with BentoML https://www.bentoml.com/blog/deploying-colpali-with-bentoml. Ideal for use cases like large-scale document retrieval.

👍 2

Sherlock Xu

02/08/2025, 1:40 AM

Happy Friday everyone! 🔍 See our new blog post comparing BentoML and Vertex AI https://www.bentoml.com/blog/comparison-between-vertex-ai-and-bentoml

💯 1

Sherlock Xu

02/15/2025, 12:42 AM

Hi everyone! 🚀 See our new blog post if you are looking for private DeepSeek deployment https://www.bentoml.com/blog/secure-and-private-deepseek-deployment-with-bentoml

Sherlock Xu

02/28/2025, 8:29 AM

Happy Friday everyone! 🚀 Read our new blog post https://www.bentoml.com/blog/building-ml-pipelines-with-mlflow-and-bentoml to build a seamless ML workflow from experimentation to production with MLflow and BentoML.

Sherlock Xu

03/07/2025, 4:46 AM

Hi everyone! Are you confused about the different versions of DeepSeek? Read our blog post to find the right one for your use case https://www.bentoml.com/blog/the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond

Sherlock Xu

03/18/2025, 9:15 AM

Hi everyone! Is your AI infrastructure slowing you down? 👀 We have identified 6 common pitfalls (https://www.bentoml.com/blog/6-infrastructure-pitfalls-slowing-down-your-ai-progress) that keep AI teams stuck and explain how BentoML fixes them.

Noman Saleem

04/10/2025, 1:08 PM

Hi, I am sending audio file .wav/.mp3 as form data from postman with file as key. I am unable to get the audio in the endpoint and process it. Need help. BentoML version: 1.4.8 Here is the code:

Copy code

@bentoml.api
    def transcribe_audio(self, file ) -> dict:
        audio, _ = librosa.load(file.file, sr=16000)
        input_values = self.audio_processor(audio, return_tensors="pt", padding="longest").input_values

        # Perform transcription
        with torch.no_grad():
            logits = self.audio_model(input_values).logits
        predicted_ids = torch.argmax(logits, dim=-1)
        transcription = self.audio_processor.decode(predicted_ids[0])

        return {"transcription": transcription}

Error:

Copy code

Traceback (most recent call last):
  File "C:\Users\User\miniconda3\envs\consforc_website\Lib\site-packages\_bentoml_impl\server\app.py", line 604, in api_endpoint_wrapper
    resp = await self.api_endpoint(name, request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\miniconda3\envs\consforc_website\Lib\site-packages\_bentoml_impl\server\app.py", line 668, in api_endpoint
    input_data = await method.input_spec.from_http_request(request, serde)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\miniconda3\envs\consforc_website\Lib\site-packages\_bentoml_sdk\io_models.py", line 213, in from_http_request
    return await serde.parse_request(request, t.cast(t.Type[IODescriptor], cls))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\miniconda3\envs\consforc_website\Lib\site-packages\_bentoml_impl\serde.py", line 224, in parse_request
    data[k] = json.loads(v)
              ^^^^^^^^^^^^^
  File "C:\Users\User\miniconda3\envs\consforc_website\Lib\json\__init__.py", line 341, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 4: invalid start byte

Also --reload does not work on windows

Sherlock Xu

04/21/2025, 1:09 AM

Hello everyone! Are you working to tackle cold starts for LLMs? Read about our journey in our latest post https://www.bentoml.com/blog/cold-starting-llms-on-kubernetes-in-under-30-seconds

Sherlock Xu

04/25/2025, 6:16 AM

Happy Friday! 🚀 See how Yext slashed time-to-market and compute costs with BentoML: https://www.bentoml.com/blog/accelerating-ai-innovation-at-yext-with-bentoml

🙌 1

Sherlock Xu

04/29/2025, 12:53 PM

Hello everyone! Read our latest post https://www.bentoml.com/blog/how-to-beat-the-gpu-cap-theorem-in-ai-inference to learn about the GPU CAP Theorem and how BentoML can help enterprises solve it 🚀

👀 1

Sherlock Xu

05/09/2025, 6:23 AM

Happy Friday! Read our post to learn how to deploy and scale Phi-4-reasoning https://www.bentoml.com/blog/deploying-phi-4-reasoning-with-bentoml

Sherlock Xu

06/12/2025, 12:47 AM

Hi everyone! Read our blog post to learn the latest tech about distributed LLM inference https://www.bentoml.com/blog/the-shift-to-distributed-llm-inference

Xiuyu Yang

06/23/2025, 2:21 PM

Hi, everyone! We're delighted to share our latest work on using interleaved autoregression scheme in MLLM into self-driving simulations, welcome to our project page:https://orangesodahub.github.io/InfGen/, also welcome to star our git repo: https://github.com/OrangeSodahub/infgen, codes and models are coming soon!

🏁 1

🍱 1

👀 2