Vendor comparison question: we're redoing our MLOp...
# general
m
Vendor comparison question: we're redoing our MLOps stack. Everything from training, to a model store, serving, etc. The last few months have culminated into some good learnings where we're about ready to commit. One question I've gotten though is "Why use [insert tool] when we could use AWS SageMaker's version of that?" SageMaker has a feature called "data capture". My understanding is that it saves all the inferences you make and then alerts if they're outside the distribution the model was trained on. I imagine this could include problems like missing data. You can use those alert events to trigger retraining/slack messages and whatnot. Can anyone make the case for WhyLabs over that feature of SageMaker?
e
Hey Eric, great question! For the scenario you brought up about data capture, we can do that by capturing a reference profile during your training cycle and then using that to monitor your model or data pipeline in production as well as trigger events based on drift, quality, or other types of anomalies. We an tie into your alerting systems and trigger automatic retraining as well. In general WhyLabs aims to provide a platform agnostic approach to monitoring data and models across a lot of different architectures. We tend to see a few common differences with SageMaker Monitor and WhyLabs such as 1) profiling 100% of your data vs being forced to sample, 2) accommodating different types of data such as text, images, embeddings, and more compared to only supporting tabular data and 3) creating an ML optimized user experience for implementing monitoring and observability, often times the pain is felt trying to create dashboards and identify root cause as it's built on cloudwatch metrics, WhyLabs focuses on minimal setup and time before getting value out of the platform. There are a few other things like extending whylogs with custom metrics for unique uses cases, we can go deeper on a call if you'd like. Hopefully this helps!
m
This is terrific, thanks @elegant-dinner-90885! So if a group to make use of SageMaker monitor, they'd have to 1. already be on SageMaker 2. is using tabular data, or is willing to do additional work to calculate and log structured/tabular metrics from non-tabular data 3. is okay recording only a sample of the inferenced data points 4. is willing to finagle CloudWatch metrics into a dashboard that is useful for modeling Super helpful, thanks!