This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

10/19/2023, 9:28 PM

This message was deleted.

Thierry Jean

10/19/2023, 10:51 PM

I don't know if this will meet all of your requirements, but I would the constant as a default argument.

Copy code

COLUMN_MAPPING = {
    'col1_new' : 'col2_old',
    'col2_new' : 'col2_old'
}

@extract_columns(*COLUMN_MAPPING .keys()):
def df_w_mapped_cols(
  raw_df: pd.DataFrame,
  column_mapping: dict = COLUMN_MAPPING,
) -> pd.DataFrame:
  return ...

One foreseeable issue though is that overriding

column_mapping

when calling

driver.execute()

won't pass the override to the

@extract_columns

decorator if your output mapping changed

Seth Stokes

10/20/2023, 1:09 AM

Yeah that should do it. Thank you.

Seth Stokes

10/20/2023, 4:47 AM

Thinking about this more. If I were to load this mapping from an external table, how could this be used inside of extract columns as well ?

Elijah Ben Izzy

10/20/2023, 2:37 PM

So, doing this truly dynamically is tough. You can pass it into it at compile-time, and use

resolve

, but I’d instead recommend just returning that dataframe instead of extracting individual columns. A good reason to use individual columns is if you want to refer to them later, but if it’s dynamic, you won’t have the names. So, you can just return the dataframe from a function with the right column names. That said, if the column names have semantic meaning (first_column, second_column) then it’s a nice approach to defined them as having that meaning, then pass in a mapping/rename as late as possible. It’ll be easier to read.

👍 1

2 Views

Open in Slack

Previous Next