This message was deleted.
# hamilton-help
s
This message was deleted.
m
I’m coming from a SQL context where you have a table and then use GROUP BY <dimension X, dimension Y> to compute various aggregates. Mostly trying to evaluate how Hamilton could replicate this type of workflow
e
Yes! So assuming: 1. You want to hardcode the set of computations 2. You want the parameterizations to be flexible/and contain multiple The tool here is
@subdag
— its very powerful, and was envisioned initially to handle exactly this — computations across different granularities. Here are the docs: https://hamilton.dagworks.io/en/latest/reference/decorators/subdag/. We actually have an example that does something very similar to what you’re interested in. In this case, we’re utilizing the ability to run a common set of computation across two dimension (region/granularity, granularity being exactly what you described). https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/reusing_functions The way to think about this is this is a nice way to reuse a group of operations while keeping it fine-grained. If you want to get really fancy we have
@parameterized_subdag
(in case you don’t like copy/pasting), and
@resolve
(to make the parameters configuration-driven), but I’d see how far subdag can get you. That said, there is another approach that you may or may not like, depending on how much you want individual functions to map to columns/datapoints. You could always: 1. Create a helper function that does these aggregations on a dataframe 2. Call it for each granularity, having the function input a dataframe/join a set of columns. So yeah, if you want the same fine-grained approach that Hamilton likes, I’d suggest subdag — if you want to just run a bunch of aggregations, dataframe manipulations aren’t crazy.
🙌 1
m
I have a lot to learn, let me explore this one too!
🫡 1
e
Heh, yeah, a few approaches. You’ll often find that its usually two buckets — in the case of dataframes/series: 1. Expresses exactly what you want, using fancy framework constructs (everything is a series, lots of nodes, etc…) 2. Less expressive but you can do whatever you want — often , but very natural for data scientists (often around passing dataframes around) The path to choose is up to you entirely — depends on needs around development
m
@Elijah Ben Izzy your subdag example almost gets me what I was looking for, thanks for pointing me to that example! That said, it requires to define each combination as separate node (well, subdag), ie I have to do result = dr.execute( [ "daily_unique_users_US", "daily_unique_users_CA", "weekly_unique_users_US", "weekly_unique_users_CA", "monthly_unique_users_US", "monthly_unique_users_CA", ] ) That’s 6 function definitions for 3 grains and 2 regions…this will explode exponentially once I have 100’s of metrics and many dimensions… sounds like
@paramiterized_subdag
might be better…let me study those next
e
Yep! Exactly. That said, you’ll have to define them somewhere, and there’s something nice about having it explicit. Verbosity, IMO isn’t always bad. Worth considering trade offs there. That said — parameterized_subdag allows you to do that externally, rather through copy/paste. Shouldn’t be too hard to grok!
m
Agreed, explicit can be useful. That said, I’d rather define a node that gives my daily values for a metric, and then use the decorators to compute that metric by different grains and/or dimensions such as region, etc. That way, if I wanna add another node, I only have to add 1 function to my module, and everything else “just works”… well, that’s the goal. Let me see how far I can take it
e
Yep, makes sense. It’s all use case dependent, and you’re the best one to determine that. You can get quite far with the decorators — feel free to come back with Qs!
m
Will certainly keep you posted :)
👍 1
@Elijah Ben Izzy do you guys have an end-to-end example snippet on how to use @resolve() ?
e
Only “end-to-end” is in unit tests, actually: • function: https://github.com/DAGWorks-Inc/hamilton/blob/90bf57dd2ba3968358df538ac143db3c24a9290b/tests/resources/dynamic_config.py#L20 • Driver: https://github.com/DAGWorks-Inc/hamilton/blob/90bf57dd2ba3968358df538ac143db3c24a9290b/tests/test_end_to_end.py#L265 Note its a little spread out though and abstracted — its doin something a bit complex for the sake of testing (should probably be simpler TBH…).
Happy to draft up an example if you have something specific you’re trying to do
The things to know: 1.
@resolve
takes in two parameters —
when
will be fixed, and
decorate_with
is a function that gives the decorator, given certain config parameters 2. It refers to the “Config” in the driver — these config parameters in the function will be pulled from that. Note its different than the “inputs” — “config” is done at runtime 3. You need to set power-user mode to true (have to prove that you’re serious 😆 )
m
Let me explore that unit test. Was mostly looking for an example I can run out of the box. This should do
Thanks!
e
Yep! Just sorry its spread out — should be easy to adapt.
👍 1