This message was deleted.
# ask-anything
s
This message was deleted.
e
does this happen while running
ploomber build
? it has happened to me before, often when the notebook uses a lot of memory
you may wanna check our profiling tool to see if that's an issue: https://github.com/ploomber/ploomber-engine#notebook-profiling-cpu-gpu-ram
l
I am thinking it is a memory issue as well. The ploomber build command works perfectly with 0-80% of the dataset, but when I get in the 80-100% use of the dataset I get this error.
e
if you're using the parallel executor, you may wanna switch to the serial one to reduce memory usage. if you're already using the serial executor, probably getting a larger machine is the easiest option. Or split the notebook into two. are you using pandas? memory usage tends to blow up with data frames
l
Thanks! I will take a look at this. That is correct this is happening using the ploomber build command. The data scientist is using pandas. This is the pipeline.yaml file:
Copy code
tasks:
  - source: scripts/get_feat_eng.py
    product:
      nb: 'products/{{out}}/sample={{sample}}/get_feat_eng.ipynb'
    params:
      sample: '{{sample}}'

  - source: scripts/train_test_split.py
    product:
      nb: 'products/{{out}}/sample={{sample}}/train_test_split.ipynb'
 
  - source: scripts/fit_predict.py
    product:
      nb: 'products/{{out}}/sample={{sample}}/fit_predict.ipynb'
e
ok so this is using the serial executor. my advice is to see if you can optimize the script that's failing, there are a few tricks to optimize memory: https://stackoverflow.com/q/39100971/709975 one quick thing would be to split the failing notebook into two, this may fix the issue since once "task-part-1" finishes executing, ploomber kills the kernel, freeing up the memory, then you can proceed to execute "task-part-2", maybe run some profiling. if you have more questions, feel free to ask
l
Thanks a ton @Eduardo! Knowing that the kernel is killed and frees ups memory after each task in the pipeline.yaml file helps a lot. The data scientists was doing the bulk of the workload in the initial script and she was planning on splitting that task later to different components later in her work flow. I will just suggest for her to do that now vs later.
e
great! hope that fixes the issue!
l
@Eduardo just as an update. Splitting the task that was failing to 2 parts did the trick. Thanks again!
e
great! 🎉