Hey guys, any tips for organizing a pipeline with ...
# ask-anything
j
Hey guys, any tips for organizing a pipeline with a ton of steps? I'd like something like this: instead of separate tasks "do_something_task1", "do_something_different_task2" I'd like to be able to call
ploomber build --partial something.task1
e
interesting. are you looking only for the
[thing].[another]
format for executing from the CLI? If so, you can override the name in the pipeline.yaml
Copy code
tasks:
  - source: scripts/script.py # this can be a function task too
    name: something.task1
ploomber identifies tasks by name so can now do:
Copy code
ploomber build something.task1
does this solve the issue?
j
Well, this is helpful already. But what I was thinking about is to also group steps so pipeline.yaml would have more of a hierarchical structure that would be easier to read.
e
we currently do not support this, bu i'd like to know how do you think this may work. any thoughts?
j
I mean something like this:
tasks:
task_group1:
- source (...)
name: t1
task_group2:
- source (...)
name: t2
upstream: task_group1.t1
but come to think of it, if I use what you proposed the added benefit of having this is not that big
e
yeah. I think it offers some value, but might be confusing since we'd now have two ways of declaring tasks (nested or regular). We have something related (import_tasks_from), currently, it only supports importing tasks from a single file but we could do something like:
Copy code
meta:
  import_tasks_from: [tasks.clean.yaml, tasks.features.yaml]
perhaps this can help? other than that, we want to keep the pipeline.yaml spec simple. at some point, once pipelines grow in size in complexity. it's probably better to use the Python API, but right now that involves a manual translation from yaml to python - at some point we wanna offer a command to translate
j
ooh so basically using a bunch of pipeline.yaml works out of the box?
e
Right now it only supports importing from a single file but we could extend it to support multiple