This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

01/30/2023, 11:30 PM

This message was deleted.

👀 1

Stefan Krawczyk

01/30/2023, 11:38 PM

Oh yep — yeah you haven’t told Hamilton about the contents of the dataframes. Two options: Option (1): Request the merged dataframe — you can only request “outputs” you have told Hamilton about. This is either functions, or via a decorator on a function.

Copy code

dr.execute(['merge_prod_w_state_map'])

Option (2): Add the

@extract_columns

decorator to

merge_prod_w_state_map

Copy code

from hamilton.function_modifiers import extract_columns
...

@extract_columns(*['State_Area', 'month_a', 'month_b', 'pct_chg', 'State', 'Standard','Postal'])
def merge_prod_w_state_map(load_production: pd.DataFrame,
                           load_mapping: pd.DataFrame
                           ) -> pd.DataFrame:
  ...

and then

dr.execute(["State"])

should work.

👍 1

Seth Stokes

01/30/2023, 11:43 PM

I see. Is @extract_columns(). kindof like using .loc[:, [list_of_columns]?

Stefan Krawczyk

01/30/2023, 11:44 PM

it’s simpler than that. It just does:

Copy code

def State(merge_prod_w_state_map: pd.Dataframe) -> pd.Series:
   return merge_prod_w_state_map["State"]
...

👍 1

Stefan Krawczyk

01/30/2023, 11:46 PM

it’s short hand for defining such functions

Seth Stokes

01/30/2023, 11:49 PM

so from there, I would define transforms on the specific series or groups of series?

Stefan Krawczyk

01/30/2023, 11:58 PM

yep

Stefan Krawczyk

01/31/2023, 12:16 AM

@Seth Stokes note, as long as the index is set correctly things should just “work”.

Seth Stokes

02/01/2023, 11:34 PM

due to this limitation in with the scope of lambda functions only returning the last result before unpacking itself into a series.. Is this where

@parameterize

could come into play? So that I would only need to define the function once and it would then do the transform on a list of columns?

Copy code

# explicit
df.assign(C1_PCT_CHG=lambda x: x.C1.pct_change(),
           C2_PCT_CHG=lambda x: x.C2.pct_change(),
           C3_PCT_CHG=lambda x: x.C3.pct_change(),
           ... 
           CN_PCT_CHG=lambda x: x.CN.pct_change()
           )
# loop that fails due to scope. (this cannot be extracted to a function and directly call df since .unstack() was called just before.)
 df.assign(**{f'{col}_PCT_CHG': lambda x: x[col].pct_change() for col in ('c1', 'c2', 'c3'...cN)})

Stefan Krawczyk

02/01/2023, 11:58 PM

if I understand correctly, it could look something like:

Copy code

COLUMN_LIST = ('c1', 'c2', 'c3'...'cN')

@parameterize_sources(**{f"{c}_PCT_CHG": {"col": c} for c in COLUMN_LIST})
def pct_change(col: pd.Series) -> pd.Series:
    return col.pct_change()

docs for parameterize_sources.

👍 1

Stefan Krawczyk

02/02/2023, 12:03 AM

otherwise to fix your non-hamilton code:

Copy code

# explicit 
df.assign(C1_PCT_CHG=df.C1.pct_change(),
           C2_PCT_CHG=df.C2.pct_change(),
           C3_PCT_CHG=df.C3.pct_change(),
           ... 
           CN_PCT_CHG=df.CN.pct_change()
           )

# in a comprehension
df.assign(**{f'{col}_PCT_CHG': df[col].pct_change() for col in ('c1', 'c2', 'c3'...cN)})

should work. (no need for the lambda)

Seth Stokes

02/02/2023, 12:09 AM

Thank you, I’m finally seeing where to utilize hamilton. The df doesn’t work, however, since I unstacked the step before, so I would have to end the chain and then implement your fix

👍 1

Open in Slack

Previous Next