Slackbot
09/09/2022, 10:53 AMEduardo
# pipeline.yaml
tasks:
- source: train.ipynb
product:
nb: 'output/{{experiment_name}}/train.ipynb'
params:
column_to_delete: '{{column_to_delete}}'
then, create an `env.yaml`:
experiment_name: default_experiment
column_to_delete: null
then implement the logic that drops a column based on the column_to_delete
value, and run your pipeline with:
ploomber build
to run an experiment with a column dropped:
ploomber build --env--experiment_name my-experiment --env-column_to_delete some-column
then you'll have output/default/train.ipynb
and output/my-experiment/train.ipynb
and you can compare them!
if you have more than one task in your pipeline (e.g. the tasks that generate the training set, you can share those files), this will allow you to cache results so next time you run an experiment, you don't have to run all tasks
# pipeline.yaml
tasks:
- source: prepare.ipynb
product:
# note that we don't use "experiment_name" here!
nb: output/prepare.ipynb
data: output/train.csv
- source: train.ipynb
product:
nb: 'output/{{experiment_name}}/train.ipynb'
params:
column_to_delete: '{{column_to_delete}}'
let me know if this helps!Jan Lennartz
09/09/2022, 2:01 PMEduardo
However, I would like to compare the results in a last task that combines all (or given versions). This seems to be exactly what the issue is about that you linked.ah, i thought you wanted one evaluation task per task generated by grid. if you want to evaluate all models at once, there's another thing you can do. you can use grid and set the next task with a placeholder:
tasks:
- source: ...
grid: ...
name: train- # since this is a grid, it'll generate train-0, train-1,...
# make this task depend on all fit- tasks
# by setting a placeholder as the upstreamm:
# upstream = ["train-*"]
- source: evaluate.ipynb
product: ...
The problem is that currently I can not easily cascade the params to my other notebooks / scripts in the pipeline.do you need to access the grid parameters in later tasks? one way would be to have the train.ipynb store its parameters in a
parameter.json
and register it as a product, then load them in the next stage. Alternatively, you can use our notebook introspector to extract values, charts, table from output cells
I think both of these things combined would get you what you want, but let me know if this is not what you want to implementEduardo
Jan Lennartz
09/12/2022, 2:20 PMEduardo