This message was deleted.
# ask-for-help
s
This message was deleted.
e
I forgot you can deploy BentoML's to sagemaker endpoints by using image-based endpoints. 1. If you do that (or even if you deploy to AWS lambda), can you still do adaptive batching? That'd be cool. 2. I saw that Sagemaker has an
enable_data_capture
flag that tracks all of the inferences made and can be used to trigger alerts if incoming data is different from the training set. I have no idea how that works, but if BentoML could be integrated with that, you'd get "free" data monitoring!
s
1. The short answer is yes, you can still do adaptive batching in AWS Lambda. The difference is in the deployment architectures. Using Yatai, a Bento is deployed as microservices, with api and runner servers running in separate containers and pods. With AWS Lambda, a Bento is deployed as a single container, with api and runner servers running as processes in the same container. The microservice architecture is more advantageous for adaptive batching because we can potentially better aggregate traffic to the same runner pod. Additionally, we can leverage resources more efficiently by having api and runners deployed on the nodes with the most suitable hardware for running them. 2. BentoML comes with data capturing capabilities as well. The
bentoml.monitor
API can input and output data of various types and ship to a destination of your choice, e.g. AWS Redshift, Google Big Query.
Let me take a stab at the more open question on comparison to SageMaker. 🙂 BentoML is an AI application platform. It allows you to develop applications that incorporate multiple machine learning models, custom logic for APIs, and different user interfaces and backends. In contrast to SageMaker and other similar products, which focus on deploying a single model, BentoML allows you to combine multiple models to work together or add custom logic without the need for additional services. Next, BentoML comes with best practices baked in for running AI applications, such as observability, model monitoring, adaptive batching, etc. At the same time, it stays open without adding technical baggage. For example, user can choose any provider for observability visualization or shipping monitoring data to any storage. Lastly, BentoML puts a lot of efforts to scale in real production use cases. Using BentoML SDKs, users can easily switch between different deployment environments, e.g. Kubernetes, Lambda, ECS, hardware, e.g. CPU, GPU, and on any cloud or on-prem without changing code or re-building.
g
Thanks @Sean for giving such a detailed review, I would like to get your thoughts about deploying bentoml service into vertex AI endpoints. I could use the image created by
bentoml containerize
and create a Vertex AI endpoint and specify the required hardware. 1. Are there any pitfalls you foresee that we need to watch out for when deploying in this way? 2. I would also like to have more information about the monitoring capabilities with
bentoml.monitor
that you highlighted. The documentation doesn't explain much in details about shipping the collected data to different providers or maybe I don't have the sufficient knowledge to if it is a known standard 😅. We want to use Evidently AI for drift detection as the vertex AI monitoring is not comprehensive for our need to capture additional features that are not coming in the API request. I liked the fact that with bentoml monitor we have the flexibility to log the features we need, but don't have enough clarity on how to integrate this logging for drift detection.
l
Hi Ghawady, 1. I'm not very familiar with vertex AI but from its documentation I'm not sure if there's benefit using a docker image provided by user comparing with genral google compute engine. Do you have any specified reason to use vertex AI? 2. Is this section of our documentation helpful? https://docs.bentoml.org/en/latest/guides/monitoring.html#shipping-the-collected-data
g
Hi Iarme, 1. The reason is mainly because we are using vertex AI pipelines to orchestrate the model training and deployment, we want to have an automated process in production and vertex ai pipelines integrates easily with other vertex ai components such as model registry and prediction endpoints. Having to deploy to a compute node engine will require additional maintenance and won't offer features like A/B testing for example. With endpoints we can deploy more than one model (i.e. containers holding different model versions for inference) to the same endpoint which is not something that can be done with compute engine. 2. I had a look at this documentation, but it still it didn't give me much clarity on the process of how to use bentoml logging / Monitoring Data Collectors to aid in model monitoring / drift detection overall process. maybe it is my lack of general knowledge in this area. Can I use OTLP endpoint to log the collected data in a database (postgresql or big query for example) then pass that dataset to another tool for drift detection? Is there any resources/references I can check to read more about this process?