Slackbot
04/20/2023, 1:04 AMElijah Ben Izzy
04/20/2023, 1:07 AMDavid Wesolowski
04/20/2023, 1:29 AMElijah Ben Izzy
04/20/2023, 1:33 AMdef raw_data() -> pd.DataFrame:
# load your data or pass it in
def processed_data(raw_data: pd.DataFrame, groupby_apply_param_1: ...) -> pd.DataFrame:
raw_data.groupby(...).apply(...) # use your params
Elijah Ben Izzy
04/20/2023, 1:36 AM@extract_columns('col_1', 'col_2', ...)
def raw_data() -> pd.DataFrame:
...
def col_1_processed(col_1: pd.Series) -> pd.Series:
return do_something_with(col_1)
def processed_data(col_1_processed: pd.Series, col_2_processed: pd.Series, ...) -> pd.DataFrame:
return pd.DataFrame({'col_1' : col_1, 'col_2' : col_2, ...}).groupby(...).apply(...)
Does this get at what you’re trying to do? I think the trick here is that Hamilton can happily process any type of object (pandas series, dataframes, primitives ,parameters, etc…)David Wesolowski
04/20/2023, 1:46 AMElijah Ben Izzy
04/20/2023, 1:50 AMdef raw_data() -> pd.DataFrame:
# load your data or pass it in
def apply_function(params: ...) -> Callable:
def apply(...):
# apply function
return apply
def processed_data(raw_data: pd.DataFrame, apply_function: Callable, groupby_apply_param_1: ...) -> pd.DataFrame:
raw_data.groupby(...).apply(...) # use your params
In this case the node is returning a function. You can pass it in as an override to the driver, or leave it as an input (but the above hardcodes it, which seems like what you want). That said, i’d be curious why it would be its own node as opposed to mixed with the processed_data
function?David Wesolowski
04/20/2023, 1:53 AMDavid Wesolowski
04/20/2023, 1:55 AMElijah Ben Izzy
04/20/2023, 1:57 AMDavid Wesolowski
04/20/2023, 2:01 AMElijah Ben Izzy
04/20/2023, 2:02 AMDavid Wesolowski
04/20/2023, 2:03 AMStefan Krawczyk
04/20/2023, 2:17 AMdef raw_data() -> pd.DataFrame:
# load your data or pass it in
def grouped_data(raw_data: pd.DataFrame, groupby_apply_param_1: ...) -> pd.GroupedDataFrame:
raw_data.groupby(...) # use your params
@config.when(apply_type="mean")
def processed_data__mean(grouped_data: pd.GroupedDataFrame) -> pd.DataFrame:
return grouped_data.mean()
@config.when(apply_type="foo-bar-apply")
def processed_data__foo_bar_apply(grouped_data: pd.GroupedDataFrame) -> pd.DataFrame:
return grouped_data.apply(lambda x: foo(x) + bar(x))
But as @Elijah Ben Izzy says — there’s a few ways — and what’s more important is what is more ergonomic/going to be updated regularly or not…David Wesolowski
04/20/2023, 5:53 AMThierry Jean
04/20/2023, 12:07 PMElijah Ben Izzy
04/20/2023, 2:51 PMStefan Krawczyk
04/21/2023, 5:16 AM