Slackbot
09/21/2023, 1:53 PMThierry Jean
09/21/2023, 2:08 PM@extract_columns(["housing_type", "feature_b", "feature_c"])
def filled_df(raw_df: pd.DataFrame) -> pd.DataFrame:
filled_df = raw_df.copy()
filled_df = ... # do your fill on all columns
return filled_df
• To preserve the lineage of columnwise operations, you can provide distinct names to your transforms and have a rename step at the end. It's a bit hacky and probably harder to maintain
def housing_type_filled(housing_type: pd.Series) -> pd.Series:
return housing_type.fillna("unknown")
... # more transforms
def joined_dataset(
housing_type_filled: pd.Series,
feature_b: pd.Series,
feature_c: pd.Series,
) -> pd.DataFrame:
# you will need to pass the name for each series
return pd.concat([
pd.Series(housing_type_filled, name="housing_type"),
pd.Series(feature_b, name="feature_b"),
pd.Series(feature_c, name="feature_c),
], axis=1)
Tobias
09/21/2023, 2:17 PM_raw
in the query directly since we're creating the initial DF directly by querying our DWH, however, I thought there might be a simple and straightforward way to handle those cases, since it feels like there should be.Thierry Jean
09/21/2023, 2:29 PM_raw
suffix to your initial inputs and have the final name of your column be the function name (i.e., the desired output should be the function named housing_type()
). That should lead to the easiest code to read and maintain.
Otherwise, we currently have a GitHub issue to add a pattern for redefining node. If you have specific requirements or features that would be useful, let me and @Elijah Ben Izzy know in this thread 🙂Elijah Ben Izzy
09/21/2023, 2:43 PMpipe
decorator — this allows you to effectively rename. Would love your feedback!
https://github.com/DAGWorks-Inc/hamilton/issues/372