Slackbot
07/04/2022, 8:48 AMJames Marvin
07/04/2022, 8:50 AMElijah Ben Izzy
07/04/2022, 3:46 PMextract_columns
decorator on top of it. Let me know if this makes sense:
@extract_columns(…)
def df_with_columns_renamed(df_original:pd.DataFrame) -> pd.DataFrame:
#rename columns and return original df
Elijah Ben Izzy
07/04/2022, 3:48 PMStefan Krawczyk
07/05/2022, 4:35 AMJames Marvin
07/05/2022, 12:13 PMJames Marvin
07/05/2022, 3:25 PMStefan Krawczyk
07/05/2022, 3:34 PMStefan Krawczyk
07/05/2022, 4:24 PMJames Marvin
07/11/2022, 7:33 AMJames Marvin
07/11/2022, 7:35 AMStefan Krawczyk
07/11/2022, 2:05 PMJames Marvin
07/11/2022, 2:05 PMStefan Krawczyk
07/11/2022, 2:06 PMStefan Krawczyk
07/11/2022, 5:10 PMStefan Krawczyk
07/11/2022, 7:05 PMdef grouped_df(col1: pd.Series, ..., colN: pd.Series) -> pd.DataFrame:
# your group logic
# new_df = ...
return new_df
in your driver:
dr = driver.Driver(config, logic_module, adapter=base.SimplePythonGraphAdapter(base.DictResult()))
result = dr.execute(['grouped_df'])
Option 2: Do it as a post step after running execute() in your driver
dr = driver.Driver(config, logic_module)
df = dr.execute(['col1', ..., 'colN'])
grouped_df = ... # your logic here.
Option 3: Run two Hamilton DAGs
This is the merger of Option 1 & Option 2.
dr1 = driver.Driver(config, logic_module)
pre_grouped_df = dr.execute(['col1', ..., 'colN'])
dr2 = driver.Driver(other_config, grouping_logic_module, adapter=base.SimplePythonGraphAdapter(base.DictResult()))
result = dr2.execute(['grouped_df'], inputs={"raw_df": pre_grouped_df}). # you can write the function to operate on a dataframe, or columns
Option 4: Add a custom Result Builder to do this
class GroupedByResult(base.ResultMixin):
"""This is a class and it has to have a static method."""
@staticmethod
def build_result(*, group_by_names: typing.List[str], **outputs: typing.Dict[str, pd.Series]) -> pd.DataFrame:
"""This function builds the result given the computed values."""
df = pd.DataFrame(outputs)
grouped_df = df.groupby(
by=group_by_names,
)
# more logic here.
return grouped_df
# driver
dr = driver.Driver({ ... "group_by_names": ["COLUMN", "NAMES"]...} , modulez,
adapter=base.SimplePythonGraphAdapter(result_builder=GroupedByResult()))
# to wire configuration through to the build_result function need to request it as an output.
output = ['USUAL', 'COLUMNS'] + ['group_by_names']
df = dr.execute(output)
Option 5: Do the filter/group on data load
Not sure how applicable this would be — but if it’s based on an index or something, do this as step when you load the data. That way downstream functions only operate over the already filtered/grouped data.Stefan Krawczyk
07/11/2022, 7:08 PMElijah Ben Izzy
07/12/2022, 1:25 PMJames Marvin
07/14/2022, 6:39 PMStefan Krawczyk
07/14/2022, 8:09 PM