Yuri Aleksandrov
04/06/2022, 7:29 PMsoopervisor
When exporting for kubeflow, it won't generate ploomber_pipeline.py
& ploomber_pipeline.yaml
files to upload to kubeflow.
Only generates Dockerfile
and builds/tags image.
soopervisor export training --mode force --ignore-git
====================================================================================== Loading DAG ======================================================================================
Found /home/technologic/projects/dev-pipeline/iris-train/pipeline.training.yaml. Loading...
====================================================================================== Loading DAG ======================================================================================
Found /home/technologic/projects/dev-pipeline/iris-train/pipeline.training.yaml. Loading...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11529.15it/s]
====================================================================================== Loading DAG ======================================================================================
Found /home/technologic/projects/dev-pipeline/iris-train/pipeline.training.yaml. Loading...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 13851.73it/s]
==================================================================================== Packaging code =====================================================================================
Copying tasks/raw.py -> dist/iris-train/tasks/raw.py
Copying tasks/features.py -> dist/iris-train/tasks/features.py
Copying tasks/__init__.py -> dist/iris-train/tasks/__init__.py
Copying exploratory/example.ipynb -> dist/iris-train/exploratory/example.ipynb
Copying soopervisor.yaml -> dist/iris-train/soopervisor.yaml
Copying README.md -> dist/iris-train/README.md
Copying scripts/fit.py -> dist/iris-train/scripts/fit.py
Copying pipeline.training.yaml -> dist/iris-train/pipeline.training.yaml
Copying training/requirements.lock.txt -> dist/iris-train/training/requirements.lock.txt
Copying training/Dockerfile -> dist/iris-train/training/Dockerfile
Copying requirements.dev.txt -> dist/iris-train/requirements.dev.txt
Copying pipeline.yaml -> dist/iris-train/pipeline.yaml
Copying requirements.lock.txt -> dist/iris-train/requirements.lock.txt
Copying requirements.dev.lock.txt -> dist/iris-train/requirements.dev.lock.txt
Copying requirements.txt -> dist/iris-train/requirements.txt
================================================================ Building image: docker build . --tag iris-train:latest =================================================================
Sending build context to Docker daemon 9.728kB
Step 1/6 : FROM condaforge/mambaforge:4.10.1-0
---> 05e3542d3437
Step 2/6 : COPY requirements.lock.txt project/requirements.lock.txt
---> Using cache
---> d1119b3038e2
Step 3/6 : RUN pip install --requirement project/requirements.lock.txt && rm -rf /root/.cache/pip/
---> Using cache
---> 99433c4c9b8d
Step 4/6 : COPY dist/* /tmp
---> 814d16353e5d
Step 5/6 : WORKDIR /tmp
---> Running in 98e1d11add9e
Removing intermediate container 98e1d11add9e
---> 7078172e9f76
Step 6/6 : RUN tar --strip-components=1 -zxvf *.tar.gz
---> Running in 54f4138cab3d
iris-train/README.md
iris-train/exploratory/
iris-train/exploratory/example.ipynb
iris-train/pipeline.training.yaml
iris-train/pipeline.yaml
iris-train/requirements.dev.lock.txt
iris-train/requirements.dev.txt
iris-train/requirements.lock.txt
iris-train/requirements.txt
iris-train/scripts/
iris-train/scripts/fit.py
iris-train/soopervisor.yaml
iris-train/tasks/
iris-train/tasks/__init__.py
iris-train/tasks/features.py
iris-train/tasks/raw.py
iris-train/training/
iris-train/training/Dockerfile
iris-train/training/requirements.lock.txt
Removing intermediate container 54f4138cab3d
---> 7160c5925f17
Successfully built 7160c5925f17
Successfully tagged iris-train:latest
=========================================== Testing image: docker run iris-train:latest ploomber status --entry-point pipeline.training.yaml ============================================
Loading pipeline...
100%|██████████| 5/5 [00:00<00:00, 8605.47it/s]
Black is not installed, parameters wont be formatted
name Last run Outdated? Product Doc (short) Location
-------- ------------ ----------- ------------ ------------- ------------
get Has not been Source code File('output None /tmp/tasks/r
run s/raw/get.cs aw.py:5
v')
sepal Has not been Source code File('output Compute /tmp/tasks/f
run & Upstream s/features/s sepal area eatures.py:4
epal.csv')
petal Has not been Source code File('output Compute /tmp/tasks/f
run & Upstream s/features/p petal area eatures.py:1
etal.csv') 3
features Has not been Source code File('output Join raw /tmp/tasks/f
run & Upstream s/features/f data with eatures.py:2
eatures.csv' generated 2
) features
fit Has not been Source code MetaProduct( Notebook to /tmp/scripts
run & Upstream {'model': Fi train a /fit.py
le('outputs/ model
model.pickle
'), 'nb': Fi
le('outputs/
report.ipynb
')})
================================================================================== Testing File client ==================================================================================
======================================================================================= Warnings ========================================================================================
Your git repository contains uncommitted files, which will be ignored when building the Docker image. Commit them if needed.
=========================================================================================================================================================================================
Error: Missing File client
Hint: Run "docker run -it iris-train:latest /bin/bash" to to debug your image. Ensure a File client is configured.
Yuri Aleksandrov
04/06/2022, 7:31 PMpipeline.training.yaml
tasks:
- source: tasks.raw.get
product: outputs/raw/get.csv
- source: tasks.features.sepal
product: outputs/features/sepal.csv
- source: tasks.features.petal
product: outputs/features/petal.csv
- source: tasks.features.features
product: outputs/features/features.csv
- source: scripts/fit.py
product:
nb: outputs/report.ipynb
model: outputs/model.pickle
Willie Wheeler
04/07/2022, 10:14 PMHassan Gamaleldin
04/12/2022, 2:16 PMload_data
, and the other is find_features
. In load_data
, I am doing some normalization and will need to use <http://pd.to|pd.to>_datetime()_
to perform these normalizations. Then, later, I will have to use the DateTime properties of the timestamp column to perform some feature extraction. Having to save the output/product as a csv file doesn't allow me to treat the dataframes in the subsequent task unless I redo some of the definitions I did in the load_data
task. Is there a way around this? Or is there a confusion regarding how I am constructing my pipeline?
Your help is greatly appreciated!Willie Wheeler
04/14/2022, 10:53 PMMrFiat124Spider
04/20/2022, 2:54 PMMatej Uhrín
04/21/2022, 12:16 PMMatej Uhrín
04/22/2022, 1:45 PM$ploomber plot
TypeError: __init__() got an unexpected keyword argument 'package_name'
full:
Traceback (most recent call last):
File "/home/m/anaconda3/envs/mma/bin/ploomber", line 5, in <module>
from ploomber_cli.cli import cmd_router
File "/home/m/anaconda3/envs/mma/lib/python3.8/site-packages/ploomber_cli/cli.py", line 36, in <module>
def cli():
File "/home/m/.local/lib/python3.8/site-packages/click/decorators.py", line 304, in decorator
return option(*(param_decls or ("--version",)), **attrs)(f)
File "/home/m/.local/lib/python3.8/site-packages/click/decorators.py", line 192, in decorator
_param_memo(f, OptionClass(param_decls, **option_attrs))
File "/home/m/.local/lib/python3.8/site-packages/click/core.py", line 1714, in _init_
Parameter._init_(self, param_decls, type=type, **attrs)
TypeError: _init_() got an unexpected keyword argument 'package_name'
What might be the issue?
pela kith
04/22/2022, 4:03 PMJess Mankewitz (they/she)
04/22/2022, 9:24 PMJess Mankewitz (they/she)
04/22/2022, 10:13 PMJess Mankewitz (they/she)
04/25/2022, 8:41 PM- source: scripts/process_group.R
product:
nb: output/process_group.html
data: output/processed_data/group_a.csv
params:
target_group: "group_a"
- source: scripts/process_group.R
product:
nb: output/process_group.html
data: output/processed_data/group_b.csv
params:
target_group: "group_b"
Raffaele Olmeda
04/26/2022, 3:54 PMStr009
05/01/2022, 11:27 AMMatej Uhrín
05/03/2022, 8:01 AMploomber nb --inject
I d like to just inject the parameters upstream, downstream etc. How can I not inject the following part:
# ---
# jupyter:
# jupytext:
# cell_metadata_filter: all
# notebook_metadata_filter: ploomber
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.13.6
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ploomber:
# injected_manually: true
# ---
I dont usually use jupyter and do my coding in pycharm so that is why i am asking.Matej Uhrín
05/03/2022, 8:48 AMNotJSONError("Notebook does not appear to be JSON: '# ---\\n# jupyter:\\n#
I did
ploomber nb --inject
Previously this used to work so I am not sure if i did something differently.Jess Mankewitz (they/she)
05/05/2022, 5:59 PMRobson Glasscock
05/05/2022, 7:30 PMAtul Yadav
05/06/2022, 9:07 AMJose Ramirez
05/08/2022, 5:34 PMferegrino
05/10/2022, 2:08 PMPythonCallable
to have multiple products?
https://docs.ploomber.io/en/latest/api/_modules/tasks/ploomber.tasks.PythonCallable.html#ploomber.tasks.PythonCallableBrandon Williams
05/10/2022, 8:04 PMpipeline.yaml
? I.e.,
Error: Error validating dag spec, the following keys aren't valid: 'params'. Valid keys are: 'clients', 'config', 'executor', 'meta', 'on_failure', 'on_finish', 'on_render', 'serializer', 'tasks', and 'unserializer'
I can't find most of these terms in the docs, including if I try to dig into the Python API. E.g., what all can go into the meta
key?Brandon Williams
05/10/2022, 8:05 PMparams
(and other keys, like kernelspec_name
) that are shared across every task? I have a use case where my pipeline has dozens of tasks, and every single one takes the same param
, leading to lots of duplicate configJess Mankewitz (they/she)
05/13/2022, 7:29 PM--entry-point
flag. We have a usecase where we have one lengthy pipeline for preprocessing our data and one pipeline for running models on the preprocessed data. Is there a way to “link” these two pipelines together, such that if I change something in the preprocessing pipeline, the model pipeline becomes out of date? can I set upstream sources in the modeling pipeline that are generated in the preprocessing pipeline (instead of hardcoding paths to the generated data)?Julien Roy
05/13/2022, 8:53 PMploomber build --log info --log-file my.log
Amardeep Singh
05/16/2022, 1:09 PM[IPKernelApp] WARNING | Error in loading extension: sql
Check your config files in /home/jupyter/.ipython/profile_default
Traceback (most recent call last):
File "/opt/conda/envs/analysis/lib/python3.10/site-packages/IPython/core/shellapp.py", line 301, in init_extensions
self.shell.extension_manager.load_extension(ext)
File "/opt/conda/envs/analysis/lib/python3.10/site-packages/IPython/core/extensions.py", line 80, in load_extension
mod = import_module(module_str)
File "/opt/conda/envs/analysis/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'sql'
Any suggestions how to deal with this? Ideally for ploomber run to be reproducible - it shouldn’t pick any extensions from default config?Gaurav
05/16/2022, 5:01 PMGaurav
05/16/2022, 5:20 PMDario Pascual Morales
05/17/2022, 6:53 AMAtul Yadav
05/18/2022, 9:25 AM