Slackbot
06/02/2022, 2:27 PMEduardo
# execute independent tasks in parallel
executor: parallel
tasks:
- source: clean.py
name: clean-
product:
nb: clean.html
clean: clean.csv
grid:
cities: [a, b, c, d]
Eduardo
Eduardo
Spruha Vashi
06/02/2022, 2:37 PMEduardo
grid
in the Python API with a for loop, check out this example - essentially you create many `NotebookRunner`/`PythonCallable` instances and each one gets a different parameter (the path to the file)
i'm guessing you have a folder where each file corresponds to data from a single city, right? then you could do:
for city in ['a', 'b', ...]
NotebookRunner(Path('your-script.py'), File(f'{city}-clean.csv'), name=f'{city}-clean', dag=dag, params=dict(input_path=f'path/to/{city}.csv'))
Eduardo
Spruha Vashi
06/07/2022, 3:38 PMEduardo
Eduardo
Eduardo
Spruha Vashi
06/07/2022, 6:27 PMEduardo
once a task is finished and a product is made ( in this case a csv file), is there anywhere to access it?the simplest way would be to pass the path to the csv file. e.g.
pd.read_csv('/path/to/data.csv')
- now, if you want to avoid harcoding paths (which is a good practice), then you can load your DAG into a Python session and extract information from it. To enable that create a factory function, then import that function into a script or notebook like this:
from my_module import my_factory_function
dag = my_factory_function()
dag.render()
# if task_name only generates one product
path = str(dag[task_name].product)
# if task_name generates >1 product
path = str(dag[task_name].product[product_key])
# ... then use path to load the csv file
does this require an env?It doesn't
right now i have my products set up like this in PYthonCallable: File(env. path.cleaned_data / "social.csv") based on what I've read, does this mean that I'm putting the social csv in a folder called cleaned_data?No. It depends on the value you passed in the env. for example if
env = {"path": {"cleaned_data": "some_directory"}}
, then your social.csv will go into some_directory/social.csv
in other words, the output path, the path depends on the value stored in the dictionary.