This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

10/06/2023, 8:00 PM

This message was deleted.

Elijah Ben Izzy

10/06/2023, 8:21 PM

Yes, this is very much what Hamilton is meant to do! And something we’re actively developing out — both with Hamilton and the DAGWorks product. The high-level design (and there are a few variants of it) is: 1. Have a single DAG that specifies everything end-to-end (E.G. instead of loading data to/from places, just pass it through 2. Break that out into the “tasks” you mentioned above — using additional data loaders/materializers to save/load intermediate results 3. Run those tasks on top of the macro-orchestration system — run Hamilton within DAGSter, dbt python, prefect, airflow, etc… 4. Get full column-level/micro-lineage, inspect intermediate results, plug whatever metadata you need into the systems you mentioned above/others, etc… The advantages to this design are pretty unique to Hamilton: 1. You can test the whole thing end to end, locally, using small data 2. You can view fine-grained/logical lineage on a per-task basis 3. You can be flexible about how you want to break it up — say that you actually want to combine tasks (1) and (2), or run it all in one, that’s all super simple with a bit of configuration. or you want to break task (1) into two different pieces, that’s fairly easy as well. So it’s a slight paradigm shift — don’t think about the tasks, think about the logic. Then you can create the tasks from the logic itself, knowing what data you want to be available and where. Does this make sense? IMO this is one of the great reasons to use Hamilton — start small then grow out.

Luke

10/06/2023, 8:27 PM

Makes sense at this stage! I’ll follow up with implementation questions as they come up.

dr.visualize_execution()

and

dr.display_all_functions()

are the correct way to render (not execute) the entire end-to-end DAG, right?

Elijah Ben Izzy

10/06/2023, 8:29 PM

Great! Since this is a use-case we’re particularly interested in, we’d be happy to hop on a call at some point soon and walk you through how we’d approach it! No worries if not though. Valuable for us to see user stories as well.

🤙 1

❤️ 1

Elijah Ben Izzy

10/06/2023, 8:36 PM

Also, yes, although I’d highly recommend you look into materializers with

visualize_materialization

as well: https://hamilton.dagworks.io/en/latest/reference/drivers/Driver/#hamilton.driver.Driver.visualize_materialization. The notion is that you can materialize from a driver, and it’ll dynamically add those nodes to the DAG. This is really useful for when you want to break out tasks and save intermediate results.

Stefan Krawczyk

10/06/2023, 8:37 PM

+1 to what @Elijah Ben Izzy said. And: 1. @Luke when we say “one large DAG” do note that, that can come from Hamilton functions declared in multiple python modules. 2. Yep there’s also other visualization functions that allow you to view the path between two nodes. Otherwise some posts that could be helpful: • Hamilton + Airflow • Lineage with Hamilton

Open in Slack

Previous Next