Slackbot
11/20/2023, 3:01 PMElijah Ben Izzy
11/20/2023, 3:05 PMElijah Ben Izzy
11/20/2023, 3:07 PMRoy Kid
11/20/2023, 3:14 PMparallelizable & Collect
, right? Just at the begin and end of GraphElijah Ben Izzy
11/20/2023, 3:22 PMRoy Kid
11/20/2023, 3:25 PMElijah Ben Izzy
11/20/2023, 3:30 PMRoy Kid
11/20/2023, 3:40 PMStefan Krawczyk
11/20/2023, 3:52 PMElijah Ben Izzy
11/20/2023, 4:06 PMElijah Ben Izzy
11/20/2023, 11:28 PMenable_dynamic_execution
mode in the driver:
1. To break it into “tasks” that consist of multiple Hamilton nodes — these are individual execution capabilities
2. To launch out a dynamic number of these tasks (on Parallelizable
) and collect the results (on Collect
inputs)
Thus, the first thing it does is break things into tasks. This is pluggable (and meant to be extended in the future) but the idea is its pretty basic:
1. Every normal part of the DAG is grouped into a task — contiguous graph components
2. “expand” nodes (Parallelizable
) form their own task
3. Anything between Parallelizable
and Collect
form their own tasks
4. Collect
nodes form their own task
This is going to be optimized over time, but then the question is “How do I execute them?” We have a tool called an ExecutionManager — this assigned executors to a task. The DefaultExecutionManager takes in two executors — local and remote. The “local” executor runs everything on the driver-side, in process (it uses the SynchronousLocalTaskExecutor to run the tasks). The “remote” executor runs wherever — you can also the same SynchronousLocalTaskExecutor to run it, and you just won’t get parallelsm. This is where Ray, dask, etc.. fit in.Elijah Ben Izzy
11/20/2023, 11:28 PM