This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

11/20/2023, 3:01 PM

This message was deleted.

Elijah Ben Izzy

11/20/2023, 3:05 PM

So the problem is that a parallelizable statement needs a collect to follow. This is something we need to have a good error message for, but basically you currently can’t ask for something if it isn’t going to be collected.

Elijah Ben Izzy

11/20/2023, 3:07 PM

See this (same problem) https://hamilton-opensource.slack.com/archives/C03M33QB4M8/p1700361590201069?thread_ts=1700344694.844199&channel=C03M33QB4M8&message_ts=1700361590.201069

🙌 1

Roy Kid

11/20/2023, 3:14 PM

Ahh, I know it! That means I need a pair of

parallelizable & Collect

, right? Just at the begin and end of Graph

Elijah Ben Izzy

11/20/2023, 3:22 PM

Yep! Can have multiple sets in a graph as well — just has to be 1:1

Roy Kid

11/20/2023, 3:25 PM

OK! Thanks a lot! Can you please further explain about executor(i.e. with_local_executor), and what is local and remote? Because according to the doc, local and remote seem to be exclusive, but coed in example specified both...?

Elijah Ben Izzy

11/20/2023, 3:30 PM

Yep! So remote will execute anything between parallel/collect. Local will execute anything else. It’s a little nuanced as to how it work — AFk now but I can type out an explanation/point you to the docs where it goes through in a bit!

Roy Kid

11/20/2023, 3:40 PM

Thanks! Although I have go through the doc several times but I get confused by some concepts. So If you are convenient, maybe update the doc and emphasize this mechanism is better? I will check the source code later. But anyway, I do thank you for your help!

Stefan Krawczyk

11/20/2023, 3:52 PM

@Roy Kid feel free to file an issue to update the docs! 🙂

🤓 1

Elijah Ben Izzy

11/20/2023, 4:06 PM

Yep! Quite possible there are pieces that need updating :) happy to do that + share the general approach/design here.

Elijah Ben Izzy

11/20/2023, 11:28 PM

OK, so some more conceptual models to go in the doc, but parking it here so its immediately useful ( @miek who might find it interesting as well). There are two main capabilities of the

enable_dynamic_execution

mode in the driver: 1. To break it into “tasks” that consist of multiple Hamilton nodes — these are individual execution capabilities 2. To launch out a dynamic number of these tasks (on

Parallelizable

) and collect the results (on

Collect

inputs) Thus, the first thing it does is break things into tasks. This is pluggable (and meant to be extended in the future) but the idea is its pretty basic: 1. Every normal part of the DAG is grouped into a task — contiguous graph components 2. “expand” nodes (

Parallelizable

) form their own task 3. Anything between

Parallelizable

and

Collect

form their own tasks 4.

Collect

nodes form their own task This is going to be optimized over time, but then the question is “How do I execute them?” We have a tool called an ExecutionManager — this assigned executors to a task. The DefaultExecutionManager takes in two executors — local and remote. The “local” executor runs everything on the driver-side, in process (it uses the SynchronousLocalTaskExecutor to run the tasks). The “remote” executor runs wherever — you can also the same SynchronousLocalTaskExecutor to run it, and you just won’t get parallelsm. This is where Ray, dask, etc.. fit in.

🙏 1

Elijah Ben Izzy

11/20/2023, 11:28 PM

Does this help clarify things?

Open in Slack

Previous Next