hey just getting started looking into Hamilton I think there Hamilton Open Source #general

hey just getting started looking into Hamilton, I ...

Cooper Snyder

09/30/2024, 12:31 PM

hey just getting started looking into Hamilton, I think there might be some examples out there but I was wondering if anyone had examples or a blog of testing approaches/organization in a repo using Hamilton to structure all of the pipelines. I was imagining a really powerful setup would be to have like the tox testing environment spin up a hamilton server and for all of the unit tests to be registered in there so anyone could clone the repo and inspect the pipeline flows? I know this is a bit of a wide question and i can see many different ways of doing it but was wondering what everyones' approaches are to that? Thank you

Elijah Ben Izzy

09/30/2024, 3:33 PM

Hey! We had been working on a blog post but did not get it out. @Thierry Jean want to share some thoughts? Otherwise, yes, we’ve thought about similar things. At a high-level, some thoughts: 1. The Hamilton UI + CI is perfect for this kind of thing — E.G. view the test then compare between runs. We’ve thought about building, say, github or gitlab actions that ran on each PR + printed out links to the runs. Would that be useful? 2. Multiple types of test — CI (data + bigger pipeline tests) and unit tests (testing individual functions/small DAGs). The Hamilton UI can help with the after-the-fact “what happened”, and give you a comparison between two 3. Would love to see the way it would work with tox — I think there’s some really interesting examples here Would be curious how others in the community are going about it!

👍 1

Stefan Krawczyk

09/30/2024, 4:39 PM

@Cooper Snyder here are my thoughts: 1. for unit tests, e.g. testing single function, people have generally relied on something like pytest. 2. For integration tests (still run via something like pytest usually) — this is where you create a Hamilton driver node to run some portion of your pipeline — then people have a central Hamilton UI server and then add the HamiltonTracker so that those executions get logged with the appropriate project & tags. 3. There’s also a Hamilton CLI that you can use to programmatically do a few things that in your CI script you could post to the pull request… For (2) people usually use

overrides=

or the

@config.when

or changing modules to get the right data for the CI into the pipeline and out of it. Do you have more requirements on what you’d like to achieve/prevent with your testing strategy?

Cooper Snyder

10/02/2024, 12:14 AM

Yeah those are good thoughts thank you very much. Essentially I'm trying to leverage Hamilton to reduce/mitigate tech debt of pipeline sprawl for all of the points in the tech debt paper, https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf except maybe the model one for now. The big one is enabling people to get up to speed quicker, by having a standard visualization of all of the code flows its easier to wrap your head around the "What is the transitive closure of all data dependencies?" piece and what are example inputs and outputs.

👍 1

Stefan Krawczyk

10/02/2024, 12:23 AM

Cool. If you need help thinking through things stop by office hours on Tuesdays at 9:30am. Else ask questions and we’ll answer when we can.

👍 1

Open in Slack

Previous Next