This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

11/15/2023, 10:39 AM

This message was deleted.

👍 1

Stefan Krawczyk

11/15/2023, 4:32 PM

Hi Roy. You can do anything in the body of the python function with Hamilton. So the question is really what kind of semantics/guarantees do you want? Since the other angle is that we could add an extension/adapter to Hamilton that makes this easier to do/converts python code to a slurm job (sorry I’m not that familiar with slurm so guessing here) and submits it…

Roy Kid

11/15/2023, 4:48 PM

Thanks for your reply! Consider we have some jobs should be computed on remote HPC by using slurm to submit. The slurm is a management system, and we need to write a bash script to tell it how much cores and memory we need to execute. Once you type

sbatch <http://submit.in|submit.in>

in the terminal, slurm will take care of everything, including copying the related data to the compute node and allocated resources (which we do not need to care about), and you can use some commands to inquire about the state. That means I can not only write down a submit in a function body, because only when the job submitted to slurm is done, can I move on to the next node. Other workflow is using a poll to detect whether the task is completed. I wonder if I want to implement some functions, which kind of component should I extend? A lot of workflow have those kind of support, such as Parsl - Parallel Scripting Library, nextflow and pegasus.

Stefan Krawczyk

11/15/2023, 5:15 PM

I see. I have more questions to try to give myself a better mental model of what code is being run where and for what purpose :) Would you be mixing regular python code, e.g. creating a dataframe, and then submitting that dataframe as part of slurm job? and/or do you want slurm to run some Hamilton code? or is the entire idea to use Hamilton to orchestrate the submission of slurm jobs — where the code for the job is defined somewhere else? or?

Stefan Krawczyk

11/15/2023, 5:20 PM

(perhaps a diagram or picture would help me — we can also schedule time next week to chat and look at code/draw a picture if that’s helpful)

Stefan Krawczyk

11/15/2023, 8:44 PM

Oh and if you come up with some example code we can figure out what the best way to make things work with Hamilton can look like — want to create a discussion? or issue on this? 🙂

Roy Kid

11/16/2023, 9:52 AM

Thanks for those questions! I try to explain it with the pale words first, and make a diagram next week because I found I have two deadlines. I want slurm job can be represented by a python function, like regular node. Two slurm jobs usually have dependency, and one should wait another finish. So I think we can create a special result builder which is a future or promise. This result can polling results from slurm and update state, and once the slurm job is done, it can retrieve the required data and next node then can start and using those data. By using this schema, I think it will not affect the DAG and programming style.

Roy Kid

11/16/2023, 9:55 AM

We do can have a discussion next week if you think this function is necessary! I opinion is it is very useful, even MS's NNI framework supports for remote submit.

Roy Kid

11/16/2023, 12:11 PM

Maybe we can implement a type which is called Future, and derive it to SlurmFuture which can inquiree query the progress of remote task? I can have a try first and submit a draft on github

Stefan Krawczyk

11/16/2023, 6:20 PM

Okay that’s helpful. @Roy Kid yep so responding here in addition to your other question that @Elijah Ben Izzy responded to. Hamilton has a lot of hooks/ways to accomplish what you want. Thinking out loud, you could write a function to submit to slurm and then there’s several things we can do: • we can wrap the function with a decorator — which could house the logic to check completion. • we could write a graphadapter that has this logic instead. It could also “checkpoint” i.e. house similar functionality to the cachingadapter. • you could write a library that does the job submission and polling, and all Hamilton is doing is orchestration + delegating to it + checkpointing.

Stefan Krawczyk

11/16/2023, 6:23 PM

Otherwise a call might just save some back and forth 🙂

Roy Kid

11/16/2023, 6:26 PM

Thanks for your suggestions, I will try them one by one to find the least invasive way to achieve what I want! Once I think this part can be abstracted from the actual business and turned into a public component, I will submit a PR to contribute the code. Thanks again for your helping!

Open in Slack

Previous Next