This message was deleted Ploomber #ask-anything

Join Slack

This message was deleted.

# ask-anything

Slackbot

06/02/2022, 2:27 PM

This message was deleted.

Eduardo

06/02/2022, 2:31 PM

yeah, looks like a grid can be helpful for your use case, it'd be something like this:

Copy code

# execute independent tasks in parallel
executor: parallel

tasks:
  - source: clean.py
    name: clean-
    product:
      nb: clean.html
      clean: clean.csv
    grid:
        cities: [a, b, c, d]

Eduardo

06/02/2022, 2:32 PM

to control the product's filenames, you can use placeholders, more on the docs

Eduardo

06/02/2022, 2:33 PM

does this solve your issue?

Spruha Vashi

06/02/2022, 2:37 PM

Does this work similarly in the Python API? Sorry, I should've specified that I am utilizing the python API in the original question.

Eduardo

06/02/2022, 2:44 PM

you can replace

grid

in the Python API with a for loop, check out this example - essentially you create many `NotebookRunner`/`PythonCallable` instances and each one gets a different parameter (the path to the file) i'm guessing you have a folder where each file corresponds to data from a single city, right? then you could do:

Copy code

for city in ['a', 'b', ...]
    NotebookRunner(Path('your-script.py'), File(f'{city}-clean.csv'), name=f'{city}-clean', dag=dag, params=dict(input_path=f'path/to/{city}.csv'))

Eduardo

06/02/2022, 2:44 PM

(it'd work the same if you're using PythonCallable)

Spruha Vashi

06/07/2022, 3:38 PM

hi, this was very helpful, but i did have another question. I am fairly new to all of this so I would just like to ask how env works and in the tasks why the file comes from env.path.data (line45 for example) , im not sure how that works and a very simple rundown would be helpful. I have been trying to find documentation for this but I can't seem to understand it. Thank you!

Eduardo

06/07/2022, 4:24 PM

yeah, our Python docs need more work. you can find all the examples here this one explains how the basics of env work

Eduardo

06/07/2022, 4:24 PM

and this one os a follow-up, showing more advanced things regarding the env but note that using an env is optional

Eduardo

06/07/2022, 4:24 PM

let me know if you have more questions!

Spruha Vashi

06/07/2022, 6:27 PM

once a task is finished and a product is made ( in this case a csv file), is there anywhere to access it? can i make a folder to store all the products I've created through the tasks? does this require an env? right now i have my products set up like this in PYthonCallable: File(env. path.cleaned_data / "social.csv") based on what I've read, does this mean that I'm putting the social csv in a folder called cleaned_data?

Eduardo

06/07/2022, 6:36 PM

once a task is finished and a product is made ( in this case a csv file), is there anywhere to access it?

the simplest way would be to pass the path to the csv file. e.g.

pd.read_csv('/path/to/data.csv')

- now, if you want to avoid harcoding paths (which is a good practice), then you can load your DAG into a Python session and extract information from it. To enable that create a factory function, then import that function into a script or notebook like this:

Copy code

from my_module import my_factory_function

dag = my_factory_function()
dag.render()

# if task_name only generates one product
path = str(dag[task_name].product)

# if task_name generates >1 product
path = str(dag[task_name].product[product_key])

# ... then use path to load the csv file

does this require an env?

It doesn't

right now i have my products set up like this in PYthonCallable: File(env. path.cleaned_data / "social.csv") based on what I've read, does this mean that I'm putting the social csv in a folder called cleaned_data?

No. It depends on the value you passed in the env. for example if

env = {"path": {"cleaned_data": "some_directory"}}

, then your social.csv will go into

some_directory/social.csv

in other words, the output path, the path depends on the value stored in the dictionary.

4 Views

Open in Slack

Previous Next