This message was deleted Ploomber #ask-anything

Join Slack

This message was deleted.

# ask-anything

Slackbot

08/26/2022, 3:23 AM

This message was deleted.

🙌 1

Eduardo

08/26/2022, 3:26 AM

great point! we think data pipelines fit well in the procedural paradigm. when you introduce objects, you have a lot of share state between the tasks in your pipeline that can cause a lot of issues. of course, we won't stop you if you decide to build your own class-based DSL on top of ploomber, but we think the procedural one works well 🙂 however, I think objects make sense inside tasks in a pipeline. for example, to represent things that do need state like models that have learned parameters after they're trained

Weihong Xu

08/26/2022, 3:43 AM

I totally agree that using mutable state in a class is a bad idea for a data pipeline. But on the other hand I think using immutable state in a class can make the code cleaner. The data that are immutable and used by different pipeline stages can be accessed from the class. A typical example is the configuration of a specific task. Without class I have to pass the context to every function that depends on it. For example, which Slurm cluster to submit the job to, etc.

Weihong Xu

08/26/2022, 3:51 AM

And logger is another example. But I think that's not a big deal compare with other other awesome feature prodvided by ploomber.

Eduardo

08/26/2022, 3:58 AM

I agree with your configuration argument. Under the hood, Ploomber represents your pipeline as an object precisely for that. But it manages the state for you. For example, if you use the S3 client, It'll reuse the same client but you don't have to touch it because Ploomber will automatically upload artifacts. Still, you have a point, we actually have an open issue about how to share state; but I haven't found the right abstraction to do it, so if you have suggestions, let me know!

4 Views

Open in Slack

Previous Next