This message was deleted.
# events
s
This message was deleted.
g
Great question! The first main difference is about data. While in traditional software you’re mainly dealing with code, ML engineers have to deal with data + code + potentially stochastic behavior. This makes the process considerably more challenging and sometimes not even deterministic. What we found is helpful is to follow 4 steps: 1. Identify problematic cohorts of data. 2. Diagnose “why” they’re not working properly (explainability helps a lot here!) 3. Fix it. Implement the necessary changes ( adding more data, changing the algo, etc) 4. Assert that it won’t happen again. Create data unit tests to make sure that cohort performance is stable!
👍 2
s
Very insightful. How does it suggest what data to add? How to communicate that to the users to begin with? Btw, Unbox is a great name for what you do.
g
Thanks! I think that the identify/diagnose part leads the decision about the data. If we notice that we fail whenever country=spain, we can collect more data where country=spain to balance our dataset. We try to communicate that visually to users by using data slices!
👍 1