This message was deleted.
# hamilton-help
s
This message was deleted.
👍 1
s
Hi Roy. You can do anything in the body of the python function with Hamilton. So the question is really what kind of semantics/guarantees do you want? Since the other angle is that we could add an extension/adapter to Hamilton that makes this easier to do/converts python code to a slurm job (sorry I’m not that familiar with slurm so guessing here) and submits it…
r
Thanks for your reply! Consider we have some jobs should be computed on remote HPC by using slurm to submit. The slurm is a management system, and we need to write a bash script to tell it how much cores and memory we need to execute. Once you type
sbatch <http://submit.in|submit.in>
in the terminal, slurm will take care of everything, including copying the related data to the compute node and allocated resources (which we do not need to care about), and you can use some commands to inquire about the state. That means I can not only write down a submit in a function body, because only when the job submitted to slurm is done, can I move on to the next node. Other workflow is using a poll to detect whether the task is completed. I wonder if I want to implement some functions, which kind of component should I extend? A lot of workflow have those kind of support, such as Parsl - Parallel Scripting Library, nextflow and pegasus.
s
I see. I have more questions to try to give myself a better mental model of what code is being run where and for what purpose :) Would you be mixing regular python code, e.g. creating a dataframe, and then submitting that dataframe as part of slurm job? and/or do you want slurm to run some Hamilton code? or is the entire idea to use Hamilton to orchestrate the submission of slurm jobs — where the code for the job is defined somewhere else? or?
(perhaps a diagram or picture would help me — we can also schedule time next week to chat and look at code/draw a picture if that’s helpful)
Oh and if you come up with some example code we can figure out what the best way to make things work with Hamilton can look like — want to create a discussion? or issue on this? 🙂
r
Thanks for those questions! I try to explain it with the pale words first, and make a diagram next week because I found I have two deadlines. I want slurm job can be represented by a python function, like regular node. Two slurm jobs usually have dependency, and one should wait another finish. So I think we can create a special result builder which is a future or promise. This result can polling results from slurm and update state, and once the slurm job is done, it can retrieve the required data and next node then can start and using those data. By using this schema, I think it will not affect the DAG and programming style.
We do can have a discussion next week if you think this function is necessary! I opinion is it is very useful, even MS's NNI framework supports for remote submit.
Maybe we can implement a type which is called Future, and derive it to SlurmFuture which can inquiree query the progress of remote task? I can have a try first and submit a draft on github
s
Okay that’s helpful. @Roy Kid yep so responding here in addition to your other question that @Elijah Ben Izzy responded to. Hamilton has a lot of hooks/ways to accomplish what you want. Thinking out loud, you could write a function to submit to slurm and then there’s several things we can do: • we can wrap the function with a decorator — which could house the logic to check completion. • we could write a graphadapter that has this logic instead. It could also “checkpoint” i.e. house similar functionality to the cachingadapter. • you could write a library that does the job submission and polling, and all Hamilton is doing is orchestration + delegating to it + checkpointing.
Otherwise a call might just save some back and forth 🙂
r
Thanks for your suggestions, I will try them one by one to find the least invasive way to achieve what I want! Once I think this part can be abstracted from the actual business and turned into a public component, I will submit a PR to contribute the code. Thanks again for your helping!