This message was deleted Ploomber #ask-anything

Join Slack

This message was deleted.

# ask-anything

Slackbot

03/19/2022, 1:33 AM

This message was deleted.

Eduardo

03/19/2022, 1:35 AM

you can use those .py files as notebooks 🙂 right click on them and then click "open with notebook", see here. does this help?

Ido (Ploomber)

03/19/2022, 1:36 AM

Oh I thought you were talking more about the pipeline yaml no? @MrFiat124Spider

MrFiat124Spider

03/19/2022, 1:39 AM

ok i run my pipeline, and now I want to interact with the executed .ipynb's. but it seems the ipynb's don't retain the session that is executed in the pipeline run? but I can't load in the data because the pd.read_csv(str(upstream([])) doesn't run outside of ploomber build. If I am missing something this effectively kills the interactivity of notebooks?

MrFiat124Spider

03/19/2022, 1:40 AM

@Ido (Ploomber) this is a different thought than the message i sent

👍 1

Ido (Ploomber)

03/19/2022, 1:40 AM

Got it 🙂

Eduardo

03/19/2022, 1:41 AM

ah. so the ipynb files in the

product

section are meant to be read-only, if you wish to edit the source code it is better to edit the

.py

file in the

source:

section. you can open it as a notebook by right clicking.

https://ploomber.io/images/doc/lab-open-with-notebook.png▾

Eduardo

03/19/2022, 1:42 AM

also, the jupyter plugin should automatically inject the upstream parameters, so the upstream thing should work. if the plugin isn't working, you can run

ploomber nb -i

to inject it manually in the

source

files

Ido (Ploomber)

03/19/2022, 1:42 AM

You can also always save the output notebooks into a specific location in the product section like

nb: output/get.ipynb

so you have the context of your execution.

MrFiat124Spider

03/19/2022, 1:44 AM

I use VSCode and understand how to use .py as .ipynb's, just to clear that. But if I reopen a .py as .ipynb I cannot run them outside of ploomber build because the cell that loads data no longer works outside of ploomber build. So I effectively need two cells to load data, one for the ploomber build, and one for individual runs to actually write the code.

Copy code

timeSeries=(pd.read_csv(upstream['createTimeSeries']['dotsTimeSeries']))

needs to become

Copy code

timeSeries=pd.read_csv('actual/path/data.csv')

in order to run the notebook outside of the ploomber pipeline and actually write the code. then once I'm done writing, I uncomment the top line and comment out the bottom line?

Eduardo

03/19/2022, 1:46 AM

ah, so if you use VSCode, execute

ploomber nb -i

and ploomber will inject the upstream variable 🙂

Eduardo

03/19/2022, 1:46 AM

then you'll be able to run stuff interactively using the .py files

MrFiat124Spider

03/19/2022, 1:47 AM

ok, do I do this after the pipeline has been executed or should I type that in and then rerun the pipeline?

Eduardo

03/19/2022, 1:48 AM

when you run

ploomber build

, it'll override the upstream if it exists, so it doesn't matter. you can run

ploomber nb -i

and whenever you want to run the full thing, run

ploomber build

- but if you change anything in your pipeline.yaml, then you'll need to run

ploomber nb -i

again

MrFiat124Spider

03/19/2022, 1:49 AM

ok, and so after the entire thing runs I'll be able to open up a .py file and start working normally?

Eduardo

03/19/2022, 1:49 AM

yeah, I'm guessing you're using vscode with .py files in percent format using the interactive mode , right?

MrFiat124Spider

03/19/2022, 1:50 AM

because right now it just all runs, creates the desired outputs, but the files are clean so I can't go in and work on the files because they don't remember anything from the ploomber build run

MrFiat124Spider

03/19/2022, 1:50 AM

yes #%% in vscode

MrFiat124Spider

03/19/2022, 1:50 AM

thank you a lot!

Eduardo

03/19/2022, 1:51 AM

yes,

ploomber nb -i

should do the trick, but feel free to post another question if it doesn't solve your issue

MrFiat124Spider

03/19/2022, 1:51 AM

and I have to reenter that every time pipeline.yaml changes, would be nice if it just stayed?

MrFiat124Spider

03/19/2022, 1:52 AM

(python) C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots>ploomber nb -i Black is not installed, parameters wont be formatted Black is not installed, parameters wont be formatted Black is not installed, parameters wont be formatted Black is not installed, parameters wont be formatted Injected celll: C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\downLoadMetaData.R C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\dataCollect.py C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\createTimeSeries.py C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\calculateNetworkStats.py C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\01-timeSeries\xgboostWindow.py Finished cell injection. Re-run this command if your pipeline.yaml changes.

MrFiat124Spider

03/19/2022, 1:52 AM

Black is not installed, parameters wont be formatted Is this of concern?

Eduardo

03/19/2022, 1:52 AM

no, it's just a warning from a third-party library

Eduardo

03/19/2022, 1:52 AM

you should be good to go

Eduardo

03/19/2022, 1:53 AM

the injected cell stays, but if doesn't update automatically, that's why you need to run it again if you change the product paths in

pipeline.yaml

MrFiat124Spider

03/19/2022, 1:55 AM

ah ok makes sense. and so now that I do ploomber build, it is rerunning everything because it will keep the variables and such so when I open the files I can execute the cells and it will be interactive

Eduardo

03/19/2022, 1:56 AM

yes, you got it!

MrFiat124Spider

03/19/2022, 2:00 AM

can this command be run selectively? Now that I am thinking of it, there could be potential memory issues with large projects, or ones with a lot of data. I could also go into notebooks now and

Copy code

del(largeDF)

but it would also be nice to make individual files interactive like:

Copy code

ploomber nb -i -specificFile.py

Eduardo

03/19/2022, 2:05 AM

there aren't performance issues with the

ploomber nb -i

command. all it's doing is creating and extra cell with the

upstream

variable, but it isn't loading anything. so you're good

Eduardo

03/19/2022, 2:06 AM

if you mean performance issues when running all your pipeline, then you can selectively run tasks with

ploomber task {task-name}

or do a partial build with

ploomber build -p {some-task}

MrFiat124Spider

03/19/2022, 2:08 AM

ah this is also useful, thank you

MrFiat124Spider

03/19/2022, 2:09 AM

what is the proper way to stop a pipeline when it is running? I just reran everything by accident and just need to rerun one quick file.

Eduardo

03/19/2022, 2:09 AM

ctrl + c

will stop execution

MrFiat124Spider

03/19/2022, 2:10 AM

What would be the task name here?

Copy code

- source: 01-timeSeries/xgboostWindow.py
    product:
      nb: 01-timeSeries/xgboost.ipynb
        # resultsDict: 00-data/model_output/xgboostResults.pickle
        # pcaResultsDict: 00-data/model_output/xgboostPCAResults.pickle

Ido (Ploomber)

03/19/2022, 2:13 AM

Copy code

xgboostWindow

You can also give it a custom name, in case you want something shorter, with this key:

-name: task_name

Also, for VSCode, if you wanna run this command automatically you can configure a file watcher. Click here for a VSCode extension And here’s the docs link: https://docs.ploomber.io/en/latest/user-guide/editors.html

👍 2

MrFiat124Spider

03/19/2022, 2:20 AM

thank you!

Eduardo

03/19/2022, 2:25 AM

re task name:

xgboostWindow

it's always the filename without the extension

MrFiat124Spider

03/19/2022, 2:26 AM

I am getting this error KeyError: "DAG does not have a task with name '{xgboostWindow.py}'" I've tried 01-timeSeries/xgboostWindow.py 01-timeSeries/xgboostWindow xgboostWindow xgboostWindow.py

MrFiat124Spider

03/19/2022, 2:29 AM

with ploomber interact and list(dag) I see this:

Copy code

list(dag)
Out[1]: 
['downLoadMetaData',
 'dataCollect',
 'createTimeSeries',
 'calculateNetworkStats',
 'xgboostWindow']

Eduardo

03/19/2022, 2:30 AM

weird, so

xgboostWindow

doesn't work?

MrFiat124Spider

03/19/2022, 2:30 AM

ok at least now I get ploomber.exceptions.TaskBuildError: Cannot build task 'xgboostWindow' because the following upstream dependencies are missing: ['calculateNetworkStats', 'createTimeSeries']. Execute upstream tasks first. If upstream tasks generate File(s) and you configured a File.client, you may also upload up-to-date copies to remote storage and they will be automatically downloaded

Eduardo

03/19/2022, 2:30 AM

oh yes. so you need to build the dag first, then you'll be able to run tasks individually

👍 1

MrFiat124Spider

03/19/2022, 5:52 PM

ok so I check this afternoon and one of my tasks failed with partial execution, but when I go to open the ipynb to troubleshoot, its not interactive so its hard to de-bug. I thought it would be interactive now?

MrFiat124Spider

03/19/2022, 5:53 PM

Or its just letting me inject the locations so I can now rerun the notebook until the error? Ok yes sorry I get it now.

👍 1

Open in Slack

Previous Next