Slackbot
10/04/2023, 11:45 AMElijah Ben Izzy
10/04/2023, 2:21 PM@resolve
— this is a power user feature that enables you to pass it in as config (e.g. as the first argument in the driver), and then create the decorator at compile time — when the dag is being constructed. https://hamilton.dagworks.io/en/latest/reference/decorators/resolve/
The thing I’m curious about is whether having a config file is the cleanest way — hamilton pushes users to effectively merge the config + the code, providing an easy way to build a readable dataflow. Configs allow you to change things quickly, but we’ve found that in most cases it’s not strictly necessary — structure in code provides an easy way to track what’s happening and understand everything you need from the code itself.
What I would do is first try (1) — in fact, extract_columns might be the nicest. You have a single dataframe, apply the transforms to all in your list, then product one series per categorical feature. Then, you can use the list of columns to extract, and if you really need to, pass it in from config using @resolve
. Happy to provide some outlines a little later when I’m back at my desk!Stefan Krawczyk
10/04/2023, 11:41 PMKhalil Mlayhi
10/05/2023, 8:50 AMElijah Ben Izzy
10/05/2023, 1:59 PMElijah Ben Izzy
10/05/2023, 2:08 PM@parameterize(
**{f”{feature}_new”: source(feature) for feature in FEATURE_LIST}
)
....
Elijah Ben Izzy
10/05/2023, 2:14 PM@resolve(
when=ResolveAt.CONFIG_AVAILABLE,
decorate_with=lambda feature_list: parameterize(
**{f”{feature}_new”: source(feature) for feature in feature_list})
)
dr = driver.Builder().with_modules(my_module).with_config({“feature_list” : load_from_config(…)}).build()
Elijah Ben Izzy
10/05/2023, 2:15 PMKhalil Mlayhi
10/05/2023, 3:22 PMdr = driver.Driver(df, modules, adapter=None).with_config({"feature_list" : cat_cols})
df = dr.execute(output_columns)
The cat_cols is a variable that I create from loading a JSON config file as a dictionary in my code. So I am not using the config file directly with Hamilton. It is a general config for different modules.Elijah Ben Izzy
10/05/2023, 3:26 PMhamilton.enable_power_user_mode
as the key and the Boolean True
as the value to the config (within with_config), you should be good.
Also, your code won’t work — there are two APIs, and you’re passing in the dataframe as the config. Fixed code coming shortly…Elijah Ben Izzy
10/05/2023, 3:28 PMdr = driver.Builder().with_modules(modules).with_config({"feature_list" : cat_cols, “hamilton.enable_power_user_mode”:True})
df = dr.execute(output_columns, inputs=df.to_dict())
Elijah Ben Izzy
10/05/2023, 3:29 PMElijah Ben Izzy
10/05/2023, 3:30 PMKhalil Mlayhi
10/05/2023, 3:38 PMAttributeError: 'Builder' object has no attribute 'execute'
Elijah Ben Izzy
10/05/2023, 3:38 PMKhalil Mlayhi
10/05/2023, 3:41 PMAttributeError: 'UpstreamDependency' object has no attribute 'items'
Elijah Ben Izzy
10/05/2023, 3:41 PMKhalil Mlayhi
10/05/2023, 3:42 PM@resolve(
when=ResolveAt.CONFIG_AVAILABLE,
decorate_with=lambda feature_list: parameterize(
**{f"{feature}_new": source(feature) for feature in feature_list})
)
def cast_to_string_type(feature_list):
return feature_list.astype(str)
Elijah Ben Izzy
10/05/2023, 3:43 PM@resolve(
when=ResolveAt.CONFIG_AVAILABLE,
decorate_with=lambda feature_list: parameterize(
**{f”{feature}_new”: {“feature” :source(feature)} for feature in feature_list})
)
Elijah Ben Izzy
10/05/2023, 3:43 PMfeature
of type series