This message was deleted Ploomber #ask-anything

Join Slack

This message was deleted.

# ask-anything

Slackbot

12/27/2022, 3:12 PM

This message was deleted.

Eduardo

12/27/2022, 3:18 PM

how are you running your pipeline? are you using soopervisor?

Juan Escamilla

12/27/2022, 3:30 PM

Hi Eduardo, I'm just using ploomber build, for the specific case of the task, I'm using ploomber task SOMETASK -some-envs--bla BLA

Eduardo

12/27/2022, 4:26 PM

if by "requirements" you mean "virtual envs", then it's possible if your tasks are scripts or notebooks. You can pass

papermill_params

to your task and pass a different

kernel_name

. of course, this involves some setup since you need to ensure you have multiple kernels registered and they are discoverable by your current environment. is this what you need? what I'm not following clearly is how you're planning to distribute in different computers

Juan Escamilla

12/27/2022, 5:06 PM

Thank you for your answer. What I mean is the following: Assume a pipeline with certain tasks. Some tasks process (transforms) data, specifically, data cleansing and geospatial operations. After a task is executed the output is written in a standard format that will be parsed by the next task. Now, there is step that requires intensive computational power and, therefore, I want to run this task in a high performance computer. As this computer has limited resources, I don't want to install the entire environment that I use for the entire pipeline. I only want to install the packages that are necessary and sufficient for the specified task. I have seen that ploomber has a mechanism for installing software ,through virtual environments, using a requirements.yml file (I suppose). and then doing somehing like conda env create -f requirements.yml (for example in the case of conda). My question is: Does this feature is exclusive for the entire pypeline or, if it's possible to make ploomber to compile a requirements.yml file for a specific task? or, more likely, I am misunderstanding the feature of ploomber install ... Thank you for your time and your patience.

Eduardo

12/27/2022, 7:30 PM

thanks for the explanation! the

ploomber install

command installs packages for the whole pipeline. to solve your problem you'd have to manually list the dependencies that the task you want to execute requires and create the requirements.txt, then execute it in a different machine. In Ploomber Cloud, we actually have this functionality, and we're able to infer a requirements.txt based on the contents of a jupyter notebook. most of the logic is open-source so if you wanna take a look, it's here. but for your use case, it's probably simpler to just create the requirements.txt file manually

Juan Escamilla

12/28/2022, 10:07 AM

ok! thanks, I will take a look. I though that that functionality may be available because of the general dependence inference that ploomber is doing for the whole pipeline. But I see where are you going with ploomber Cloud. Thanks for the explanation. It is a really neat project! Thank you for your time and work.

Eduardo

12/28/2022, 2:19 PM

sure! feel free to ask any other questions!

Open in Slack

Previous Next