This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

11/04/2023, 9:18 PM

This message was deleted.

miek

11/04/2023, 9:23 PM

I’m coming from a SQL context where you have a table and then use GROUP BY <dimension X, dimension Y> to compute various aggregates. Mostly trying to evaluate how Hamilton could replicate this type of workflow

Elijah Ben Izzy

11/04/2023, 10:35 PM

Yes! So assuming: 1. You want to hardcode the set of computations 2. You want the parameterizations to be flexible/and contain multiple The tool here is

@subdag

— its very powerful, and was envisioned initially to handle exactly this — computations across different granularities. Here are the docs: https://hamilton.dagworks.io/en/latest/reference/decorators/subdag/. We actually have an example that does something very similar to what you’re interested in. In this case, we’re utilizing the ability to run a common set of computation across two dimension (region/granularity, granularity being exactly what you described). https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/reusing_functions The way to think about this is this is a nice way to reuse a group of operations while keeping it fine-grained. If you want to get really fancy we have

@parameterized_subdag

(in case you don’t like copy/pasting), and

@resolve

(to make the parameters configuration-driven), but I’d see how far subdag can get you. That said, there is another approach that you may or may not like, depending on how much you want individual functions to map to columns/datapoints. You could always: 1. Create a helper function that does these aggregations on a dataframe 2. Call it for each granularity, having the function input a dataframe/join a set of columns. So yeah, if you want the same fine-grained approach that Hamilton likes, I’d suggest subdag — if you want to just run a bunch of aggregations, dataframe manipulations aren’t crazy.

🙌 1

miek

11/04/2023, 11:32 PM

I have a lot to learn, let me explore this one too!

🫡 1

Elijah Ben Izzy

11/04/2023, 11:44 PM

Heh, yeah, a few approaches. You’ll often find that its usually two buckets — in the case of dataframes/series: 1. Expresses exactly what you want, using fancy framework constructs (everything is a series, lots of nodes, etc…) 2. Less expressive but you can do whatever you want — often , but very natural for data scientists (often around passing dataframes around) The path to choose is up to you entirely — depends on needs around development

miek

11/05/2023, 12:57 AM

@Elijah Ben Izzy your subdag example almost gets me what I was looking for, thanks for pointing me to that example! That said, it requires to define each combination as separate node (well, subdag), ie I have to do result = dr.execute( [ "daily_unique_users_US", "daily_unique_users_CA", "weekly_unique_users_US", "weekly_unique_users_CA", "monthly_unique_users_US", "monthly_unique_users_CA", ] ) That’s 6 function definitions for 3 grains and 2 regions…this will explode exponentially once I have 100’s of metrics and many dimensions… sounds like

@paramiterized_subdag

might be better…let me study those next

Elijah Ben Izzy

11/05/2023, 1:12 AM

Yep! Exactly. That said, you’ll have to define them somewhere, and there’s something nice about having it explicit. Verbosity, IMO isn’t always bad. Worth considering trade offs there. That said — parameterized_subdag allows you to do that externally, rather through copy/paste. Shouldn’t be too hard to grok!

miek

11/05/2023, 1:15 AM

Agreed, explicit can be useful. That said, I’d rather define a node that gives my daily values for a metric, and then use the decorators to compute that metric by different grains and/or dimensions such as region, etc. That way, if I wanna add another node, I only have to add 1 function to my module, and everything else “just works”… well, that’s the goal. Let me see how far I can take it

Elijah Ben Izzy

11/05/2023, 1:16 AM

Yep, makes sense. It’s all use case dependent, and you’re the best one to determine that. You can get quite far with the decorators — feel free to come back with Qs!

miek

11/05/2023, 1:16 AM

Will certainly keep you posted :)

👍 1

miek

11/09/2023, 12:01 AM

@Elijah Ben Izzy do you guys have an end-to-end example snippet on how to use @resolve() ?

Elijah Ben Izzy

11/09/2023, 12:04 AM

Only “end-to-end” is in unit tests, actually: • function: https://github.com/DAGWorks-Inc/hamilton/blob/90bf57dd2ba3968358df538ac143db3c24a9290b/tests/resources/dynamic_config.py#L20 • Driver: https://github.com/DAGWorks-Inc/hamilton/blob/90bf57dd2ba3968358df538ac143db3c24a9290b/tests/test_end_to_end.py#L265 Note its a little spread out though and abstracted — its doin something a bit complex for the sake of testing (should probably be simpler TBH…).

Elijah Ben Izzy

11/09/2023, 12:04 AM

Happy to draft up an example if you have something specific you’re trying to do

Elijah Ben Izzy

11/09/2023, 12:06 AM

The things to know: 1.

@resolve

takes in two parameters —

when

will be fixed, and

decorate_with

is a function that gives the decorator, given certain config parameters 2. It refers to the “Config” in the driver — these config parameters in the function will be pulled from that. Note its different than the “inputs” — “config” is done at runtime 3. You need to set power-user mode to true (have to prove that you’re serious 😆 )

miek

11/09/2023, 12:09 AM

Let me explore that unit test. Was mostly looking for an example I can run out of the box. This should do

miek

11/09/2023, 12:09 AM

Thanks!

Elijah Ben Izzy

11/09/2023, 12:10 AM

Yep! Just sorry its spread out — should be easy to adapt.

👍 1

Open in Slack

Previous Next