Slackbot
11/23/2022, 5:28 PMElijah Ben Izzy
11/23/2022, 5:36 PM@parameterize(**lag_parameterization)
def lag_series(feature: pd.Series, lags: List[int]) -> pd.DataFrame:
# pretty sure there's a better vectorized way to do this, but...
return pd.DataFrame({f"lag_{lag}" : feature.shift(lag) for lag in lags}
Then you can use that upstream.Elijah Ben Izzy
11/23/2022, 5:40 PMStefan Krawczyk
11/23/2022, 5:50 PMGregory Jeffrey
11/23/2022, 5:54 PMlags
in initial_data or configGregory Jeffrey
11/23/2022, 5:55 PMElijah Ben Izzy
11/23/2022, 5:59 PMextract_columns
to support multiple dataframes -- if you're interested in making an OS contribution at some point...Elijah Ben Izzy
11/23/2022, 6:01 PMsource
, value
, and config
could be a third but its unclear whether its the value, or the source... One could even imagine {'lag' : source(config('lag'))}
meaning that its a source, the value of which comes from config. Will need to noodle.Gregory Jeffrey
11/23/2022, 6:10 PMElijah Ben Izzy
11/23/2022, 6:18 PMdef lagged_feature(feature: Series, lags: List[int]) -> Series:
# pretty sure there's a better vectorized way to do this, but...
return pd.DataFrame({f"lag_{lag}" : feature.shift(lag) for lag in lags}
Then you use that downstream. Exactly how is more dependent on how you intend to model your problem, but the core idea is labeling in some way to what is meaning to your workflow. E.G. you could have lag_a
, lag_b
, and lag_c
(if you're looking for features based on three different lags 🤷 ) and extract that from your dataframe/pass it in via the lags
parameter. Makes sense?Gregory Jeffrey
11/23/2022, 6:25 PMElijah Ben Izzy
11/23/2022, 6:26 PMStefan Krawczyk
11/23/2022, 7:01 PM