This message was deleted.
# ask-anything
s
This message was deleted.
🙌 1
e
great point! we think data pipelines fit well in the procedural paradigm. when you introduce objects, you have a lot of share state between the tasks in your pipeline that can cause a lot of issues. of course, we won't stop you if you decide to build your own class-based DSL on top of ploomber, but we think the procedural one works well 🙂 however, I think objects make sense inside tasks in a pipeline. for example, to represent things that do need state like models that have learned parameters after they're trained
w
I totally agree that using mutable state in a class is a bad idea for a data pipeline. But on the other hand I think using immutable state in a class can make the code cleaner. The data that are immutable and used by different pipeline stages can be accessed from the class. A typical example is the configuration of a specific task. Without class I have to pass the context to every function that depends on it. For example, which Slurm cluster to submit the job to, etc.
And logger is another example. But I think that's not a big deal compare with other other awesome feature prodvided by ploomber.
e
I agree with your configuration argument. Under the hood, Ploomber represents your pipeline as an object precisely for that. But it manages the state for you. For example, if you use the S3 client, It'll reuse the same client but you don't have to touch it because Ploomber will automatically upload artifacts. Still, you have a point, we actually have an open issue about how to share state; but I haven't found the right abstraction to do it, so if you have suggestions, let me know!