This message was deleted.
# hamilton-help
s
This message was deleted.
e
So the problem is that a parallelizable statement needs a collect to follow. This is something we need to have a good error message for, but basically you currently can’t ask for something if it isn’t going to be collected.
r
Ahh, I know it! That means I need a pair of
parallelizable & Collect
, right? Just at the begin and end of Graph
e
Yep! Can have multiple sets in a graph as well — just has to be 1:1
r
OK! Thanks a lot! Can you please further explain about executor(i.e. with_local_executor), and what is local and remote? Because according to the doc, local and remote seem to be exclusive, but coed in example specified both...?
e
Yep! So remote will execute anything between parallel/collect. Local will execute anything else. It’s a little nuanced as to how it work — AFk now but I can type out an explanation/point you to the docs where it goes through in a bit!
r
Thanks! Although I have go through the doc several times but I get confused by some concepts. So If you are convenient, maybe update the doc and emphasize this mechanism is better? I will check the source code later. But anyway, I do thank you for your help!
s
@Roy Kid feel free to file an issue to update the docs! 🙂
🤓 1
e
Yep! Quite possible there are pieces that need updating :) happy to do that + share the general approach/design here.
OK, so some more conceptual models to go in the doc, but parking it here so its immediately useful ( @miek who might find it interesting as well). There are two main capabilities of the
enable_dynamic_execution
mode in the driver: 1. To break it into “tasks” that consist of multiple Hamilton nodes — these are individual execution capabilities 2. To launch out a dynamic number of these tasks (on
Parallelizable
) and collect the results (on
Collect
inputs) Thus, the first thing it does is break things into tasks. This is pluggable (and meant to be extended in the future) but the idea is its pretty basic: 1. Every normal part of the DAG is grouped into a task — contiguous graph components 2. “expand” nodes (
Parallelizable
) form their own task 3. Anything between
Parallelizable
and
Collect
form their own tasks 4.
Collect
nodes form their own task This is going to be optimized over time, but then the question is “How do I execute them?” We have a tool called an ExecutionManager — this assigned executors to a task. The DefaultExecutionManager takes in two executors — local and remote. The “local” executor runs everything on the driver-side, in process (it uses the SynchronousLocalTaskExecutor to run the tasks). The “remote” executor runs wherever — you can also the same SynchronousLocalTaskExecutor to run it, and you just won’t get parallelsm. This is where Ray, dask, etc.. fit in.
🙏 1
Does this help clarify things?