Slackbot
05/13/2022, 7:29 PMEduardo
output_i_want_to_link
key in your env.yaml
and map it to the output location (say clean-data.csv
) then reference it in pipeline.preprocess.yaml
(as a product
in the final task) and pipeline.model.yaml
(as a param
in the first task) with {{output_i_want_to_link}}
, this way, you'll avoid hardcoding the path. to mark the training pipeline as outdated when your data changes you can use resources_ feature. however, this will cause ploomber to compute the hash on the file, which isn't scalable if the file is too big. what's the file size?Eduardo
pipeline.yaml
?Jess Mankewitz (they/she)
05/13/2022, 7:37 PMEduardo
resources_
will work. It'll probably take a few seconds to hash the file but it will solve your problem. let me know how it goes. we have a long-standing issue about adding a feature to facilitate composing pipelines and maybe it's the right time to tackle it 🙂Jess Mankewitz (they/she)
05/13/2022, 7:40 PMEduardo
Jess Mankewitz (they/she)
05/13/2022, 7:43 PMJess Mankewitz (they/she)
05/25/2022, 9:33 PMenv.yml
, but I’m getting a warning that its not defined…how do I check which env ploomber is pointing to?Eduardo
Jess Mankewitz (they/she)
05/25/2022, 10:56 PMError: Error replacing placeholders:
* {{save_path}}: Ensure the placeholder is defined in the env
Loaded env: EnvDict({'cwd': '/Users/jessi...ense_pipeline', 'git': 'main', 'git_hash': 'c5441e3-dirty', 'here': '/Users/jessi...ense_pipeline', ...})
Jess Mankewitz (they/she)
05/25/2022, 10:56 PMsave_path: preprocessing/output/processed_data/preprocessed_semcor_tags.csv
Jess Mankewitz (they/she)
05/25/2022, 10:57 PM- source: preprocessing/scripts/preprocess_semcor_tags.py
product:
nb: preprocessing/output/notebooks/preprocess_semcor_tags.ipynb
data: {{save_path}}
Jess Mankewitz (they/she)
05/25/2022, 10:58 PMEduardo
Eduardo
Jess Mankewitz (they/she)
05/25/2022, 11:04 PMEduardo