This message was deleted.
# ask-anything
s
This message was deleted.
r
For one, I don't love this architecture. Data scientists need to rewrite their pipelines when they go to deploy. This feels time consuming especially to fit it into their zflow concept. Personally I like running notebooks in production because I receive a richer runbook when things go wrong. To me, it's very important that data scientists own as much of the end to end pipeline as possible. It's so much easier to debug a failed pipeline in a notebook than some work ported into step functions. Notebooks totally scale and are reproducible. Papermill is a great tool if data scientists own their work all the way into production. Sometimes you learn things by what wasn't said. The article didn't talk about containers and managing python environments, which I find to be a harder thing to scale across an organization than notebooks in production. Anyway, I feel like this company had bad notebook practices and opted for a very bespoke solution when it probably would have been cheaper to just buy a platform for data scientists. But maybe I'm just being a little pessimistic this morning.
👍 1
e
thanks for your feedback! I agree with you; forcing data scientists doesn't feel like an optimal solution in many fronts. Most companies actually solve the "notebook problem" this way: by asking the data scientists to refactor the notebooks into something else, when I talk to them, something I've realized is that they want to enforce this refactoring process because notebooks are usually unusable for production (dead code paths, non-formatted, contain chunks of code that are not relevant for the final output, etc) and by enforcing this refactoring they ensure data scientists clean up the mess. I think we're still far from making "notebooks in production" mainstream but we'll get there 🙂
👍 1