This message was deleted.
# ask-anything
s
This message was deleted.
e
great question! we have an open issue that will simplify this use case - but haven't finished working on it yet! I'd suggest adding one parameter to pass the columns you want to train on:
Copy code
# pipeline.yaml
tasks:
  - source: train.ipynb
    product:
      nb: 'output/{{experiment_name}}/train.ipynb'
    params:
      column_to_delete: '{{column_to_delete}}'
then, create an `env.yaml`:
Copy code
experiment_name: default_experiment
column_to_delete: null
then implement the logic that drops a column based on the
column_to_delete
value, and run your pipeline with:
Copy code
ploomber build
to run an experiment with a column dropped:
Copy code
ploomber build --env--experiment_name my-experiment --env-column_to_delete some-column
then you'll have
output/default/train.ipynb
and
output/my-experiment/train.ipynb
and you can compare them! if you have more than one task in your pipeline (e.g. the tasks that generate the training set, you can share those files), this will allow you to cache results so next time you run an experiment, you don't have to run all tasks
Copy code
# pipeline.yaml
tasks:
  - source: prepare.ipynb
    product:
      # note that we don't use "experiment_name" here!
      nb: output/prepare.ipynb
      data: output/train.csv

  - source: train.ipynb
    product:
      nb: 'output/{{experiment_name}}/train.ipynb'
    params:
      column_to_delete: '{{column_to_delete}}'
let me know if this helps!
j
Thanks for the detailed explanation! In fact this is what I've done for now. I created a 'v1' with the current columns and a 'v2' with the additional column by using env.yaml and params. However, I would like to compare the results in a last task that combines all (or given versions). This seems to be exactly what the issue is about that you linked. The problem is that currently I can not easily cascade the params to my other notebooks / scripts in the pipeline. And because the selection of the columns comes very early in the pipeline I have to run everything multiple times. Of course this is not possible to avoid but I lack the possibility to easily combine the results later on. Within a given pipeline I'm always stuck to the current parameter. Ideally I want to be able to change (or add) any task in my pipeline (e.g. add a feature) and compare the results to see how this change affects the results. If I understand correctly this will be possible with https://github.com/ploomber/ploomber/issues/602 but currently it is not possible to do this within the pipeline, i.e. you have to manually look at the different pipeline results or create a comparison script outside the pipelines. Is that correct?
e
However, I would like to compare the results in a last task that combines all (or given versions). This seems to be exactly what the issue is about that you linked.
ah, i thought you wanted one evaluation task per task generated by grid. if you want to evaluate all models at once, there's another thing you can do. you can use grid and set the next task with a placeholder:
Copy code
tasks:
  - source: ...
    grid: ...
    name: train- # since this is a grid, it'll generate train-0, train-1,...

  # make this task depend on all fit- tasks
  # by setting a placeholder as the upstreamm:
  # upstream = ["train-*"]
  - source: evaluate.ipynb
    product: ...
The problem is that currently I can not easily cascade the params to my other notebooks / scripts in the pipeline.
do you need to access the grid parameters in later tasks? one way would be to have the train.ipynb store its parameters in a
parameter.json
and register it as a product, then load them in the next stage. Alternatively, you can use our notebook introspector to extract values, charts, table from output cells I think both of these things combined would get you what you want, but let me know if this is not what you want to implement
btw, a final alternative is to use the Python API directly, which is a lot more flexible for this kind of dynamic things, the notebook introspector has a complete example
j
Thanks for the hint with notebook introspector! This is also going into the direction I was looking for. The parameters cascading I can probably accomplish then also via the DAG as well. I will try and see if I can achieve it 🙂
e
great, feel free to post other questions if you need help! I'd love to see that notebook pipeline up and running!