This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

12/12/2023, 9:25 AM

This message was deleted.

Elijah Ben Izzy

12/12/2023, 3:29 PM

Hey @Roel Bertens! As of yet, hamilton isn’t aware of classes — functions as parts of DAGs are assumed to be top-level functions in a module (or you can pass them in through

ad_hoc_utils.create_temporary_module

). So, while its fine for classes to co-exist along with functions that get turned into nodes, the hamilton parser/converter will not be aware of them, and thus not include them in visualization. We should probably be a bit more explicit, but this is close(ish): • https://hamilton.dagworks.io/en/latest/concepts/hamilton-function-structure/#modules-and-helper-functions What are you trying to do?

Roel Bertens

12/14/2023, 10:59 AM

Okay thanks

Roel Bertens

12/14/2023, 11:00 AM

Well I have some structure with classes that takes care of the computation flow. So keeping that but using Hamilton for lineage is one option. The other is to rewrite the whole thing and also use Hamilton for computation flow.

Elijah Ben Izzy

12/14/2023, 7:32 PM

Got it, yeah, that makes sense. There are probably ways to combine, would have to see. But yeah, options range from rewriting using Hamilton to living in both worlds (using hamilton functions to call out to your classes components), to having larger, coarse-grained hamilton functions that contain instances and call flows. Up to you! This is a fairly common question (I’m already organizing my flows using classes), so lmk if you come up with a solution that makes you happy!

Roel Bertens

12/15/2023, 11:20 AM

@Elijah Ben Izzy so I ended up refactoring the whole thing but still have some questions: 1. when a function returns a spark df can I use a decorator to signal which columns it will produce in order to include those in the DAG? 2. why are inputs visualised multiple times if they are used by multiple functions? Is there an option to draw them only once with multiple arrows?

Roel Bertens

12/15/2023, 12:19 PM

And another question. Some columns that are created will be regarded as 'features'. I want to keep a definition (text) of these columns in the code which I want to extract to automatically create some docs (that will also include the nice lineage graph). Any best practices/tips on how to do that using Hamilton functionality (that I might not be aware of)?

Elijah Ben Izzy

12/15/2023, 8:42 PM

Makes sense, nice work! So, answers: 1. We don’t have it yet, but it’s on the roadmap — just a matter of finding the time. With

@with_columns

it works, but that’s slightly overkill for df -> df functions. That said, I think that we can probably find a way to add that in. One option would be to wrap the

@tag

decorator to signify the column names. Then, we could customize the viz slightly (its not that complicated) to display the names of the columns as part of the node for a spark df — I think it would be a valuable thing. It could be its own node (and you could do this with some basic graph manipulations), or it could just show up on the viz… 2. There’s a parameter to turn this off! 3. So yeah, we don’t have documentation generation/cataloging (except through the DAGWorks product, which is free to try/use in a limited quantity, happy to walk you through), but from an OS perspective, you have the pieces. You should use

@tag

to indicate which nodes you want then

Driver.list_available_variables(…)

. @miek recently looked into this, might have some insights! If you want, I’d be happy to sit down and hack on this for an hour/walk you through some pieces next week if you’re interested. One thing I’m thinking about is a custom visualizer for the spark plugin that has all the bells and whistles — if you were interested in contributing I think we could make quick work of this

Roel Bertens

12/18/2023, 12:28 PM

Hi @Elijah Ben Izzy I got time this Thursday roughly 9-17 CET and would be interested in contributing. Would that work for you?

miek

12/18/2023, 1:03 PM

@Roel Bertens to create my own (sort of) data catalog, Hamilton already provides you with all the upstream/downstream nodes of a particular node (the driver has funcs for this). In my case, I converted all those nodes to clickable links in a Streamlit app where each link points to the node definition in my git repo. In Streamlit you display each node/ its upstream / its downstream / some meta data as 1 row in a 4 column DataFrame With a bit of tweaking this worked quite nicely. Haven’t published this in public repo as I did this at my work place but rolling your own doesn’t take long

Elijah Ben Izzy

12/18/2023, 5:57 PM

Thanks @miek! @Roel Bertens will DM you to coordinate a time.

Roel Bertens

12/18/2023, 7:19 PM

Sounds interesting @miek, I was already thinking to have different types of nodes with different colors or something. And the linking to source also sounds nice. Any pointers would be appreciated 🙂

2 Views

Open in Slack

Previous Next