This message was deleted.
# hamilton-help
s
This message was deleted.
đź‘€ 1
s
Oh yep — yeah you haven’t told Hamilton about the contents of the dataframes. Two options: Option (1): Request the merged dataframe — you can only request “outputs” you have told Hamilton about. This is either functions, or via a decorator on a function.
Copy code
dr.execute(['merge_prod_w_state_map'])
Option (2): Add the
@extract_columns
decorator to
merge_prod_w_state_map
Copy code
from hamilton.function_modifiers import extract_columns
...

@extract_columns(*['State_Area', 'month_a', 'month_b', 'pct_chg', 'State', 'Standard','Postal'])
def merge_prod_w_state_map(load_production: pd.DataFrame,
                           load_mapping: pd.DataFrame
                           ) -> pd.DataFrame:
  ...
and then
dr.execute(["State"])
should work.
👍 1
s
I see. Is @extract_columns(). kindof like using .loc[:, [list_of_columns]?
s
it’s simpler than that. It just does:
Copy code
def State(merge_prod_w_state_map: pd.Dataframe) -> pd.Series:
   return merge_prod_w_state_map["State"]
...
👍 1
it’s short hand for defining such functions
s
so from there, I would define transforms on the specific series or groups of series?
s
yep
@Seth Stokes note, as long as the index is set correctly things should just “work”.
s
due to this limitation in with the scope of lambda functions only returning the last result before unpacking itself into a series.. Is this where
@parameterize
could come into play? So that I would only need to define the function once and it would then do the transform on a list of columns?
Copy code
# explicit
df.assign(C1_PCT_CHG=lambda x: x.C1.pct_change(),
           C2_PCT_CHG=lambda x: x.C2.pct_change(),
           C3_PCT_CHG=lambda x: x.C3.pct_change(),
           ... 
           CN_PCT_CHG=lambda x: x.CN.pct_change()
           )
# loop that fails due to scope. (this cannot be extracted to a function and directly call df since .unstack() was called just before.)
 df.assign(**{f'{col}_PCT_CHG': lambda x: x[col].pct_change() for col in ('c1', 'c2', 'c3'...cN)})
s
if I understand correctly, it could look something like:
Copy code
COLUMN_LIST = ('c1', 'c2', 'c3'...'cN')

@parameterize_sources(**{f"{c}_PCT_CHG": {"col": c} for c in COLUMN_LIST})
def pct_change(col: pd.Series) -> pd.Series:
    return col.pct_change()
docs for parameterize_sources.
👍 1
otherwise to fix your non-hamilton code:
Copy code
# explicit 
df.assign(C1_PCT_CHG=df.C1.pct_change(),
           C2_PCT_CHG=df.C2.pct_change(),
           C3_PCT_CHG=df.C3.pct_change(),
           ... 
           CN_PCT_CHG=df.CN.pct_change()
           )

# in a comprehension
df.assign(**{f'{col}_PCT_CHG': df[col].pct_change() for col in ('c1', 'c2', 'c3'...cN)})
should work. (no need for the lambda)
s
Thank you, I’m finally seeing where to utilize hamilton. The df doesn’t work, however, since I unstacked the step before, so I would have to end the chain and then implement your fix
👍 1