This message was deleted.
# ask-anything
s
This message was deleted.
a
This is my current code, which works but does not run in parallel.
Copy code
for f in files:
    env = make_env(f)
    dag = DAGSpec('pipeline.yaml', env=env).to_dag()
    dag.build()
Is task grid the answer?
i
You got it, grid allows you to parallel on the task level. So in this example we see multiple tasks from a single task (via paramaterizing the estimators)
You also have a reference there for the docs, is that what you were looking for?
If it’s about notebooks we have an open issue on it, feel free to share thoughts there.
a
I'm not sure. I'm confused on how to calculate outputs, since they are unique to each input file.
e
how does your dag look like? is it like a straight line? a -> b -> c?
a
Yes super simple. Just a -> b
all of these tasks are independent
e
yeah. the way to go is to build one big DAG, however, we currently do not have a way to merge many dags into a single one (so you can run them in parallel). can you open an issue on github?
the only way to do this at the moment is by using the Python API directly. you may want to go that route https://github.com/ploomber/projects/tree/master/python-api-examples
a
yes I am using the python API directly
e
i meant, not using DAGSpec, but using the DAG object directly, like this: https://github.com/ploomber/projects/blob/master/python-api-examples/examples/basic.py
a
Making one big DAG is fine by me, I'm just not sure the proper way to accumulate all the tasks
Oh I see
I guess my question still remains. I will look at what I can do with a DAG object
πŸ‘ 1
e
yeah, please open an issue - I quickly took a look at the code and I think this change would be trivial to implement
a
ok thank you
e
if you want to write your pipeline with the Python API, check the example i sent. then you'll need to create ShellTask and File
for your ShellScript:
Copy code
from pathlib import Path

ShellScript(Path('my-script.sh'), product=File('some-output')) # initialize with a Path object
a
Don't I need to pass a
DAG
to
ShellScript
?
e
yes
you need to
a
so I need to load a "template" dag first
e
Copy code
from pathlib import Path
from ploomber import DAG
dag = DAG()
one = ShellScript(Path('my-script.sh'), product=File('some-output'), dag=dag) 
two = ShellScript(Path('another-script.sh'), product=File('another-output'), dag=dag) # initialize with a Path object
one >> two
dag.build()
not sure if that actually runs but that's the idea haha
you can create a loop like:
Copy code
dag = DAG()

for file in files:
   # add tasks for every file you want to process here
to create the big dag
oh and to run this in parallel:
Copy code
from ploomber import DAG
from ploomber.executors import Parallel

dag = DAG(executor=Parallel())
a
I get an error saying "ShellScript must include {{product}} in its source" when I try to make a new ShellScript
despite having it
e
are you passing a path object?
from pathlib import Path; ShellScript(Path('something.sh'))
it has to be a path object, not a str
a
oops
e
no worries, it's bad design on our end, we haven't taken the time to fix it
a
OK great let me try. I do like Path objects
πŸ‘ 1
It worked!
Great thanks let's see if I can extend this pipeline, maybe add some followup tasks
e
nice, feel free to post any other questions
πŸ‘ 1
a
I have to ensure each
ShellScript
has a unique name, which is behavior I can agree with. Thanks again. have a good weekend
πŸ™Œ 1
πŸ‘ 1