This message was deleted.
# ask-anything
s
This message was deleted.
e
yeah, looks like a grid can be helpful for your use case, it'd be something like this:
Copy code
# execute independent tasks in parallel
executor: parallel

tasks:
  - source: clean.py
    name: clean-
    product:
      nb: clean.html
      clean: clean.csv
    grid:
        cities: [a, b, c, d]
to control the product's filenames, you can use placeholders, more on the docs
does this solve your issue?
s
Does this work similarly in the Python API? Sorry, I should've specified that I am utilizing the python API in the original question.
e
you can replace
grid
in the Python API with a for loop, check out this example - essentially you create many `NotebookRunner`/`PythonCallable` instances and each one gets a different parameter (the path to the file) i'm guessing you have a folder where each file corresponds to data from a single city, right? then you could do:
Copy code
for city in ['a', 'b', ...]
    NotebookRunner(Path('your-script.py'), File(f'{city}-clean.csv'), name=f'{city}-clean', dag=dag, params=dict(input_path=f'path/to/{city}.csv'))
(it'd work the same if you're using PythonCallable)
s
hi, this was very helpful, but i did have another question. I am fairly new to all of this so I would just like to ask how env works and in the tasks why the file comes from env.path.data (line45 for example) , im not sure how that works and a very simple rundown would be helpful. I have been trying to find documentation for this but I can't seem to understand it. Thank you!
e
yeah, our Python docs need more work. you can find all the examples here this one explains how the basics of env work
and this one os a follow-up, showing more advanced things regarding the env but note that using an env is optional
let me know if you have more questions!
s
once a task is finished and a product is made ( in this case a csv file), is there anywhere to access it? can i make a folder to store all the products I've created through the tasks? does this require an env? right now i have my products set up like this in PYthonCallable: File(env. path.cleaned_data / "social.csv") based on what I've read, does this mean that I'm putting the social csv in a folder called cleaned_data?
e
once a task is finished and a product is made ( in this case a csv file), is there anywhere to access it?
the simplest way would be to pass the path to the csv file. e.g.
pd.read_csv('/path/to/data.csv')
- now, if you want to avoid harcoding paths (which is a good practice), then you can load your DAG into a Python session and extract information from it. To enable that create a factory function, then import that function into a script or notebook like this:
Copy code
from my_module import my_factory_function

dag = my_factory_function()
dag.render()

# if task_name only generates one product
path = str(dag[task_name].product)

# if task_name generates >1 product
path = str(dag[task_name].product[product_key])

# ... then use path to load the csv file
does this require an env?
It doesn't
right now i have my products set up like this in PYthonCallable: File(env. path.cleaned_data / "social.csv") based on what I've read, does this mean that I'm putting the social csv in a folder called cleaned_data?
No. It depends on the value you passed in the env. for example if
env = {"path": {"cleaned_data": "some_directory"}}
, then your social.csv will go into
some_directory/social.csv
in other words, the output path, the path depends on the value stored in the dictionary.