This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

09/08/2022, 2:40 PM

This message was deleted.

Elijah Ben Izzy

09/08/2022, 2:56 PM

~~Hmm, example of what you want to do?~~ Ahh see the edit. I think

parameterize

comes close, but it's not quite the same (e.g. you already have to define parameters). In most cases we've found that there are a few specific configurations, so a combination of

config.when

parameterize

, and optional params with defaults tends to do the best and ensure self-documenting, clear-to-read pipelines. There's a tool we built to completely replace the functionality of a function called
dynamic_node
but it's not proven generally useful yet and tends to build really confusing pipelines — one of those things that was built for a workflow at stitch fix that we probably should have bypassed entirely.

Ben

09/08/2022, 3:01 PM

My issue is the data I'm pulling from is coming from columns with dynamic names (representing vintages). I can know/assume those names in advance based on the current date, but they're going to be changing as time passes.

Elijah Ben Izzy

09/08/2022, 3:02 PM

OK, so a few ways to do this:

Copy code

def df(...):
    return pd.DataFrame([[1,1,3,np.nan], [np.nan,2,3,4], [np.nan,0,5,6], index=['2022-01', '2022-02', '2022-03'], columns=['v202009', 'v202010', 'v202011', 'v202012'])

def last_value_series(df: pd.DataFrame, parameterized_cols: List[str]) -> pd.Series:
    return df.loc[:, parameterized_cols].iloc[-1]

Then you pass

parameterized_cols

into the driver as part of

config

or a runtime input.

Elijah Ben Izzy

09/08/2022, 3:03 PM

(although I'd recommend naming them

version

or something. If I understand what you're doing -- at Stitch Fix this is typically done with partitions over a dataset -- E.G. we save a new dataset for each time we regenerate and then run using that

as of

date)

Ben

09/08/2022, 3:29 PM

Ok that makes sense. And I can just treat the dataframe as another element of the config dict when I create the Driver as well. (instead of as individual columns)

Elijah Ben Izzy

09/08/2022, 3:36 PM

Yep, so you can pass in the dataframe however you want, but it'll be a single unit. you don't have to do individual columns instead. Would recommend making it part of a module called

data_loaders

then passing in overrides, but entirely depends on your preferred approach 🙂

Open in Slack

Previous Next