This message was deleted.
# hamilton-help
s
This message was deleted.
s
or would it just be a mapping before the pipeline/`@extract_columns`?
Copy code
def input_field_mapping(mapping: dict = {}) -> dict:
    """Field mapping step to ensure downstream node don't break should the input field names change."""
    mapping = {...}
    return mapping
Copy code
def raw_df(data_path: str) -> pd.DataFrame:
    return pd.read_csv(data_path)
Copy code
@extract_columns(
    "YearBuilt", 
    "LotFrontage", 
    "GarageArea", 
    "OverallQual", 
    "OverallCond", 
    "MSZoning", 
    "TotalBsmtSF"
)
def raw_data_w_standard_field_names(raw_df: pd.DataFrame, mapping: dict) -> pd.DataFrame:
    # some work
    return raw_df.rename(columns=mapping)
s
Yep that seems like a reasonable approach. Maintaining a mapping that helps keep things standard for everyone else downstream.
t
Maybe I'm not fully covering your use case, but I like having columns as a module level constant. This allows you to reuse it throughout the module for consistency. It also depends how dynamic the mapping is for example
Copy code
RAW_COLUMN_MAPPING = {
    ...: "YearBuilt", 
    ...: "LotFrontage", 
    ...: "GarageArea", 
    ...: "OverallQual", 
    ...: "OverallCond", 
    ...: "MSZoning", 
    ...: "TotalBsmtSF",
}

# allows you to do
@extract_columns(*RAW_COLUMN_MAPPING.values())  # unpack dictionary values
def raw_data_w_standard_field_names(raw_df: pd.DataFrame, mapping: dict = RAW_COLUMN_MAPPING) -> pd.DataFrame:
   return raw_df.rename(columns=mapping)
If your dataflows spans multiple modules, you can still access the mapping / column names via
my_module.RAW_COLUMN_MAPPING