Slackbot
02/18/2022, 1:36 PMEduardo
Eduardo
Ido (Ploomber)
OndĹej HubĂĄÄek
02/18/2022, 2:31 PMOndĹej HubĂĄÄek
02/18/2022, 2:31 PMOndĹej HubĂĄÄek
02/18/2022, 2:33 PMEduardo
# %%
format and open it as notebooks in Jupyter, so the same file can be developed in Jupyter/VScode without the complexities of the ipynb format, and to check that thing isn't broken you can just call ploomber build
and it'll orchestrate execution
⢠you mention that kedro only supports DAGs, can you expand on that? What features are missing?
⢠yeah, using the # %%
is very convenient for data scientists because they can still develop interactively but since we simplify the modular pipeline building part, they can create a lot more maintainable code (10 tasks, 20 cells each instead of a big script with 200 cells)
⢠we support templated SQL, check this tutorial - same concept as generating sql from Python but a lot simpler. while it is not as powerful as using Python, it covers 90% of use cases and simplifies the code a lot
please share your experience when you give ploomber a try. and don't hesitate to post any questions, we are happy to help!OndĹej HubĂĄÄek
02/18/2022, 2:52 PMEduardo
OndĹej HubĂĄÄek
02/18/2022, 2:59 PMEduardo
Matej UhrĂn
02/18/2022, 3:12 PMEduardo
OndĹej HubĂĄÄek
02/19/2022, 10:06 AMdef create_pipeline():
node1 = node(func=node1_func, inputs="a", outputs="b")
node2 = node(func=node2_func, inputs="c", outputs="d")
node3 = node(func=add, inputs=["b", "d"], outputs="sum")
return Pipeline([node1, node2, node3])
Just by looking at the pipeline, I can see how are the outputs passed through the pipeline. I can for example split a dataset in a node into training and testing sets, and just by setting the inputs and outputs ensure, that node/task for model training receives only the training set.
From that I understand, in ploomber you specify only the order in which the tasks should be evaluated (using upstream)?Matej UhrĂn
02/19/2022, 1:50 PM- source: add.py
name: add
upstream: [b, d]
or perhaps
- source: node3
name: add
upstream: [node1, node2]
Eduardo
Eduardo
OndĹej HubĂĄÄek
02/19/2022, 2:37 PMEduardo