This message was deleted.
# hamilton-help
s
This message was deleted.
đź‘€ 1
e
Hey! So, yes, I’m pretty sure it can be done. To clarify — is the custom function something you want set on a case-by-case basis? A few different configurations? Or hardcoded?
d
It is hardcoded, may take parameters in addition to the data
e
Awesome! So yeah, I think this is pretty striaghtforward.
Copy code
def raw_data() -> pd.DataFrame:
    # load your data or pass it in

def processed_data(raw_data: pd.DataFrame, groupby_apply_param_1: ...) -> pd.DataFrame:
    raw_data.groupby(...).apply(...) # use your params
Note that this just uses dataframes. If you want to do processing on a per-column basis prior to grouping, it should be pretty easy:
Copy code
@extract_columns('col_1', 'col_2', ...)
def raw_data() -> pd.DataFrame:
    ... 

def col_1_processed(col_1: pd.Series) ->  pd.Series:
    return do_something_with(col_1)

def processed_data(col_1_processed: pd.Series, col_2_processed: pd.Series, ...) -> pd.DataFrame:
    return pd.DataFrame({'col_1' : col_1, 'col_2' : col_2, ...}).groupby(...).apply(...)
Does this get at what you’re trying to do? I think the trick here is that Hamilton can happily process any type of object (pandas series, dataframes, primitives ,parameters, etc…)
d
I was thinking that the apply function is explicitly treated as a node. I will be passing around dataframes rather than series' in this case
e
Ahh yep, definitely doable then:
Copy code
def raw_data() -> pd.DataFrame:
    # load your data or pass it in

def apply_function(params: ...) -> Callable:
    def apply(...):
        # apply function
    return apply

def processed_data(raw_data: pd.DataFrame, apply_function: Callable, groupby_apply_param_1: ...) -> pd.DataFrame:
    raw_data.groupby(...).apply(...) # use your params
In this case the node is returning a function. You can pass it in as an override to the driver, or leave it as an input (but the above hardcodes it, which seems like what you want). That said, i’d be curious why it would be its own node as opposed to mixed with the
processed_data
function?
d
I often use this pattern where the computation for one member is called by a function which is passed the collection. The computation is involved
It's not that important. I am new to using DAGs. Thought it might be helpful to decompose things in this way. But it seems like it's not a good fit in the framework
e
I think its fine either way — I wouldn’t say its non hamiltonian. I’m (personally) hesitant to send non-serializable data across function boundaries, but we do that in quite a few places and its not a problem. Its more a question of what you + your team find readable/easy to write.
d
No problem. I am very thankful for this package. My primary use case is unit testing data transformation stages. I was sick of passing parameters across many functions to parameterise each test. Hamilton solves this problem very nicely. It's all explicit and I can inject data at any stage cleanly.
👍 1
e
Awesome! Glad to hear 🙂 We’ll be here to answer any more questions you have.
d
great work
🙏 2
s
to throw in one more idea — you could make the apply an explicit node that takes in a grouped data frame — and then have “config” determine which one to call.
Copy code
def raw_data() -> pd.DataFrame:
    # load your data or pass it in

def grouped_data(raw_data: pd.DataFrame, groupby_apply_param_1: ...) -> pd.GroupedDataFrame:
    raw_data.groupby(...) # use your params

@config.when(apply_type="mean")
def processed_data__mean(grouped_data: pd.GroupedDataFrame) -> pd.DataFrame:
    return grouped_data.mean()

@config.when(apply_type="foo-bar-apply")
def processed_data__foo_bar_apply(grouped_data: pd.GroupedDataFrame) -> pd.DataFrame:
    return grouped_data.apply(lambda x: foo(x) + bar(x))
But as @Elijah Ben Izzy says — there’s a few ways — and what’s more important is what is more ergonomic/going to be updated regularly or not…
d
I will keep this trick in mind. Thank you.
t
Interesting convo! I also encountered similar groupby scenarios a few times before
e
@Thierry Jean nice! If you want to contribute a post or content/docs about it I’ll happily edit 🙂
s
One more thought are re-reading your initial post. a Fan-out-in pattern is possible. To do a fan-in, you’d either manually list out what is being fanned in via function parameter arguments, or if you need something more dynamic, then using
@resolve + @inject
(docs here and here) could also work.