Hi guys, the hamilton ray adapter is using Ray Wo...
# general
v
Hi guys, the hamilton ray adapter is using Ray Workflows. The ray docs mentions, that Ray Workflows is deprecated and will be removed. https://docs.ray.io/en/latest/workflows/index.html Are there any plans to update the hamilton ray adapter? Thanks!
s
So there's two adapters.
So we just need to delete the workflow one -- if you wanted to submit a PR? 😉
v
Thanks. What is the difference between these? The ray workflow based one converted every node into a ray remote function and the new one creates one remote function/module for the whole dataflow?
s
Both do the same in terms of turning functions into ray.remote ones. Workflows just auto did checkpointing on top of that IIRC.
v
Ok. Thanks
Just found, that beside the adapters, there is also the
RayTaskExecutor
(https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/plugins/h_ray.py#L271) How is this different to the adapters? What is the recommended way to run hamilton dataflows on ray?
s
The task executor is used when you are using the parallel and collect constructs. It means that a whole Hamilton subgraph becomes run as a single Ray task.
This is useful if you're doing the same thing over and over and want to parallelize that over Ray. E.g. file processing
The ray graph adapter instead makes each function a ray remote task. Right now you can only do one or the other. No inherent limitations, just that's what was implemented, e.g. we could make the adapter smarter and work with the parallel and collect constructs...
v
Thanks Stefan! So in general, remote executors are used for parallel and collect and adapters like RayGraphAdapter, or FutureAdapter are used to parallelize all nodes?
s
yep that's a reasonable summary.
We could get smarter about grouping parts of the DAG into tasks for Ray to get around SERDE costs - but that's a bit of a research topic.
🙏 1