Hi guys the hamilton ray adapter is using Ray Workflows The Hamilton Open Source #general

Hi guys, the hamilton ray adapter is using Ray Wo...

Volker Lorrmann

03/26/2025, 10:06 AM

Hi guys, the hamilton ray adapter is using Ray Workflows. The ray docs mentions, that Ray Workflows is deprecated and will be removed. https://docs.ray.io/en/latest/workflows/index.html Are there any plans to update the hamilton ray adapter? Thanks!

Stefan Krawczyk

03/26/2025, 6:11 PM

So there's two adapters.

Stefan Krawczyk

03/26/2025, 6:12 PM

Regular Ray and an old ray workflow based one.

Stefan Krawczyk

03/26/2025, 6:13 PM

So we just need to delete the workflow one -- if you wanted to submit a PR? 😉

Volker Lorrmann

03/26/2025, 6:15 PM

Thanks. What is the difference between these? The ray workflow based one converted every node into a ray remote function and the new one creates one remote function/module for the whole dataflow?

Stefan Krawczyk

03/26/2025, 7:14 PM

Both do the same in terms of turning functions into ray.remote ones. Workflows just auto did checkpointing on top of that IIRC.

Volker Lorrmann

03/27/2025, 8:44 AM

Ok. Thanks

Volker Lorrmann

04/02/2025, 7:23 PM

Just found, that beside the adapters, there is also the

RayTaskExecutor

(https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/plugins/h_ray.py#L271) How is this different to the adapters? What is the recommended way to run hamilton dataflows on ray?

Stefan Krawczyk

04/02/2025, 7:32 PM

The task executor is used when you are using the parallel and collect constructs. It means that a whole Hamilton subgraph becomes run as a single Ray task.

Stefan Krawczyk

04/02/2025, 7:34 PM

This is useful if you're doing the same thing over and over and want to parallelize that over Ray. E.g. file processing

Stefan Krawczyk

04/02/2025, 7:35 PM

The ray graph adapter instead makes each function a ray remote task. Right now you can only do one or the other. No inherent limitations, just that's what was implemented, e.g. we could make the adapter smarter and work with the parallel and collect constructs...

Volker Lorrmann

04/02/2025, 8:06 PM

Thanks Stefan! So in general, remote executors are used for parallel and collect and adapters like RayGraphAdapter, or FutureAdapter are used to parallelize all nodes?

Stefan Krawczyk

04/02/2025, 8:10 PM

yep that's a reasonable summary.

Stefan Krawczyk

04/02/2025, 8:10 PM

We could get smarter about grouping parts of the DAG into tasks for Ray to get around SERDE costs - but that's a bit of a research topic.

🙏 1

Open in Slack

Previous Next