This message was deleted.
# ask-anything
s
This message was deleted.
m
like I'm struggling on finding an example to copy for accessing tags in a .py file
maybe a walkthrough of one of these on the docs page https://github.com/ploomber/projects/tree/master/templates
e
thanks for your feedback. are you looking for an example to edit the cell tags? I' m not sure I'm following
m
Copy code
# %% tags=["parameters"]
upstream = ["raw"]
product = None

# %%
df = pd.read_csv(upstream['get']['data'])
this was confusing me, how does the upstream reference 'get' if the only upstream is "raw"?
also it seems the advanced ML example totally ditches the .yaml file and builds the pipeline in python API, is this because the python API gives the user lower level or more functional control over the pipeline?
e
oh, that example seems wrong. is that somewhere in our docs?
yep, the ml advanced uses the Python API since it gives more flexibility, but it requires you to write more code
👍 1
oh, I spotted the error. I'll fix it, thanks!
👍 1
m
thanks!
e
please keep this questions coming. it's pretty hard to write good docs, so all feedback is welcome, especially errors like this
👍 1
m
would it be possible to number the steps on this page https://docs.ploomber.io/en/latest/get-started/basic-concepts.html
the page seems kinda hard to follow for complete noobs (for which the page is intended)
reference the .yaml in this diagram, it just kinda appears later in the page
why is the cell injection explanation in the Tasks: scripts/notebooks section and not in the others?
cell injection is key to all of ploomber no? I'd include it in a specific task section only if it was specific to this task?
to make this more clear, maybe show a .yaml of these task functions being used. its not clear how they get used in the overall orchestration from just this page
further there is a clean.py and a functions.clean
these are separate things but the common name in the docs makes it confusing
right after showing the yaml
Copy code
tasks:
  # this is a sql script task
  - source: raw.sql
    product: [schema, name, table]
    # ...

  # this is a function task
  # "my_functions.clean" is equivalent to: from my_functions import clean
  - source: my_functions.clean
    product: output/clean.csv

  # this is a script task (notebooks work the same)
  - source: plot.py
    product:
      # scripts always generate a notebook (more on this in the next section)
      nb: output/plots.ipynb
you explain about a clean.py file
but in fact it is never used in the just referenced pipeline, kinda confusing?
i get you want to demonstrate each task primitive, but why no demonstrate the .py primitive with the later plot.py file?
I guess the ideal would be to demonstrate a minimal pipeline showcasing all 3 types of tasks and the required info to use each
not v easy, hopefully the ramblings of a total noob will help
e
this is great. thanks a lot. do you think it'd be better to have the same example implemented with scripts, functions and SQL?
so basically split this into three parts
what's your github handle? I'll tag you in an issue i'm writing now so you can provide more feedback there
m
rcyost
that would be useful, but if it is more worth your time a single pipeline with 3 steps showing each of the 3 types of tasks a bit more clearly would be the baseline to get the basic info needed
🙏 1