This message was deleted.
# ask-anything
s
This message was deleted.
e
you can use those .py files as notebooks πŸ™‚ right click on them and then click "open with notebook", see here. does this help?
i
Oh I thought you were talking more about the pipeline yaml no? @MrFiat124Spider
m
ok i run my pipeline, and now I want to interact with the executed .ipynb's. but it seems the ipynb's don't retain the session that is executed in the pipeline run? but I can't load in the data because the pd.read_csv(str(upstream([])) doesn't run outside of ploomber build. If I am missing something this effectively kills the interactivity of notebooks?
@Ido (Ploomber) this is a different thought than the message i sent
πŸ‘ 1
i
Got it πŸ™‚
e
ah. so the ipynb files in the
product
section are meant to be read-only, if you wish to edit the source code it is better to edit the
.py
file in the
source:
section. you can open it as a notebook by right clicking.

https://ploomber.io/images/doc/lab-open-with-notebook.pngβ–Ύ

also, the jupyter plugin should automatically inject the upstream parameters, so the upstream thing should work. if the plugin isn't working, you can run
ploomber nb -i
to inject it manually in the
source
files
i
You can also always save the output notebooks into a specific location in the product section like
nb: output/get.ipynb
so you have the context of your execution.
m
I use VSCode and understand how to use .py as .ipynb's, just to clear that. But if I reopen a .py as .ipynb I cannot run them outside of ploomber build because the cell that loads data no longer works outside of ploomber build. So I effectively need two cells to load data, one for the ploomber build, and one for individual runs to actually write the code.
Copy code
timeSeries=(pd.read_csv(upstream['createTimeSeries']['dotsTimeSeries']))
needs to become
Copy code
timeSeries=pd.read_csv('actual/path/data.csv')
in order to run the notebook outside of the ploomber pipeline and actually write the code. then once I'm done writing, I uncomment the top line and comment out the bottom line?
e
ah, so if you use VSCode, execute
ploomber nb -i
and ploomber will inject the upstream variable πŸ™‚
then you'll be able to run stuff interactively using the .py files
m
ok, do I do this after the pipeline has been executed or should I type that in and then rerun the pipeline?
e
when you run
ploomber build
, it'll override the upstream if it exists, so it doesn't matter. you can run
ploomber nb -i
and whenever you want to run the full thing, run
ploomber build
- but if you change anything in your pipeline.yaml, then you'll need to run
ploomber nb -i
again
m
ok, and so after the entire thing runs I'll be able to open up a .py file and start working normally?
e
yeah, I'm guessing you're using vscode with .py files in percent format using the interactive mode , right?
m
because right now it just all runs, creates the desired outputs, but the files are clean so I can't go in and work on the files because they don't remember anything from the ploomber build run
yes #%% in vscode
thank you a lot!
e
yes,
ploomber nb -i
should do the trick, but feel free to post another question if it doesn't solve your issue
m
and I have to reenter that every time pipeline.yaml changes, would be nice if it just stayed?
(python) C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots>ploomber nb -i Black is not installed, parameters wont be formatted Black is not installed, parameters wont be formatted Black is not installed, parameters wont be formatted Black is not installed, parameters wont be formatted Injected celll: C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\downLoadMetaData.R C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\dataCollect.py C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\createTimeSeries.py C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\00-data\calculateNetworkStats.py C:\Users\yosty\Desktop\Desktop_Folder\14 - git\timeSeriesDOTS\ploomber\dots\01-timeSeries\xgboostWindow.py Finished cell injection. Re-run this command if your pipeline.yaml changes.
Black is not installed, parameters wont be formatted Is this of concern?
e
no, it's just a warning from a third-party library
you should be good to go
the injected cell stays, but if doesn't update automatically, that's why you need to run it again if you change the product paths in
pipeline.yaml
m
ah ok makes sense. and so now that I do ploomber build, it is rerunning everything because it will keep the variables and such so when I open the files I can execute the cells and it will be interactive
e
yes, you got it!
m
can this command be run selectively? Now that I am thinking of it, there could be potential memory issues with large projects, or ones with a lot of data. I could also go into notebooks now and
Copy code
del(largeDF)
but it would also be nice to make individual files interactive like:
Copy code
ploomber nb -i -specificFile.py
e
there aren't performance issues with the
ploomber nb -i
command. all it's doing is creating and extra cell with the
upstream
variable, but it isn't loading anything. so you're good
if you mean performance issues when running all your pipeline, then you can selectively run tasks with
ploomber task {task-name}
or do a partial build with
ploomber build -p {some-task}
m
ah this is also useful, thank you
what is the proper way to stop a pipeline when it is running? I just reran everything by accident and just need to rerun one quick file.
e
ctrl + c
will stop execution
m
What would be the task name here?
Copy code
- source: 01-timeSeries/xgboostWindow.py
    product:
      nb: 01-timeSeries/xgboost.ipynb
        # resultsDict: 00-data/model_output/xgboostResults.pickle
        # pcaResultsDict: 00-data/model_output/xgboostPCAResults.pickle
i
Copy code
xgboostWindow
You can also give it a custom name, in case you want something shorter, with this key:
-name: task_name
Also, for VSCode, if you wanna run this command automatically you can configure a file watcher. Click here for a VSCode extension And here’s the docs link: https://docs.ploomber.io/en/latest/user-guide/editors.html
πŸ‘ 2
m
thank you!
e
re task name:
xgboostWindow
it's always the filename without the extension
m
I am getting this error KeyError: "DAG does not have a task with name '{xgboostWindow.py}'" I've tried 01-timeSeries/xgboostWindow.py 01-timeSeries/xgboostWindow xgboostWindow xgboostWindow.py
with ploomber interact and list(dag) I see this:
Copy code
list(dag)
Out[1]: 
['downLoadMetaData',
 'dataCollect',
 'createTimeSeries',
 'calculateNetworkStats',
 'xgboostWindow']
e
weird, so
xgboostWindow
doesn't work?
m
ok at least now I get ploomber.exceptions.TaskBuildError: Cannot build task 'xgboostWindow' because the following upstream dependencies are missing: ['calculateNetworkStats', 'createTimeSeries']. Execute upstream tasks first. If upstream tasks generate File(s) and you configured a File.client, you may also upload up-to-date copies to remote storage and they will be automatically downloaded
e
oh yes. so you need to build the dag first, then you'll be able to run tasks individually
πŸ‘ 1
m
ok so I check this afternoon and one of my tasks failed with partial execution, but when I go to open the ipynb to troubleshoot, its not interactive so its hard to de-bug. I thought it would be interactive now?
Or its just letting me inject the locations so I can now rerun the notebook until the error? Ok yes sorry I get it now.
πŸ‘ 1