This message was deleted.
# hamilton-help
s
This message was deleted.
e
Hey! Not at my computer now, but I can give you some sample code snippets later. So it depends if you want them each to be their own node. Options: 1. Wrap it in a single dataframe — have that take in the list, and pass it in as inputs at runtime. This is good if they vary on a per-run basis, or if you want them to be specified in an upstream function. 2. Use
@resolve
— this is a power user feature that enables you to pass it in as config (e.g. as the first argument in the driver), and then create the decorator at compile time — when the dag is being constructed. https://hamilton.dagworks.io/en/latest/reference/decorators/resolve/ The thing I’m curious about is whether having a config file is the cleanest way — hamilton pushes users to effectively merge the config + the code, providing an easy way to build a readable dataflow. Configs allow you to change things quickly, but we’ve found that in most cases it’s not strictly necessary — structure in code provides an easy way to track what’s happening and understand everything you need from the code itself. What I would do is first try (1) — in fact, extract_columns might be the nicest. You have a single dataframe, apply the transforms to all in your list, then product one series per categorical feature. Then, you can use the list of columns to extract, and if you really need to, pass it in from config using
@resolve
. Happy to provide some outlines a little later when I’m back at my desk!
s
@Khalil Mlayhi let us know if this helped get you unstuck — happy to share some code if you need it. But otherwise the design decision is what do you want to have in code — and what should be configuration.
k
@Elijah Ben Izzy Thank you for your suggestions. I think it can work but it does not fit my design. I just want to avoid some code redundancy by using a list of columns as input instead of the @parameterize format. Also, I want to use a config file so if I want to add a column to the list I add it there instead of searching where to add it in the code. Nevertheless, your reply was helpful. 😄 @Stefan Krawczyk Thanks for offering to help more. I appreciate it 😊
e
So yep, if you’re dead set on config, then resolve can absolutely do it. A common approach is to make the list a constant, then you can modify that! Then parameterize would have ** and a dictionary comprehension
Copy code
@parameterize(
    **{f”{feature}_new”: source(feature) for feature in FEATURE_LIST}
)
....
👍 1
The with resolve, you can pass that on from config:
Copy code
@resolve(
    when=ResolveAt.CONFIG_AVAILABLE,
    decorate_with=lambda feature_list: parameterize(
    **{f”{feature}_new”: source(feature) for feature in feature_list})
)
Copy code
dr = driver.Builder().with_modules(my_module).with_config({“feature_list” : load_from_config(…)}).build()
Then the shape of the dag is built off of config! Note I left out something — the config has to enable power user mode — see the docs for resolve, but the error will be self-explanatory :)
k
Thanks for the help. How can I enable power_user_mode, please? This is my code:
Copy code
dr = driver.Driver(df, modules, adapter=None).with_config({"feature_list" : cat_cols})
df = dr.execute(output_columns)
The cat_cols is a variable that I create from loading a JSON config file as a dictionary in my code. So I am not using the config file directly with Hamilton. It is a general config for different modules.
e
Yep, so if you pass in
hamilton.enable_power_user_mode
as the key and the Boolean
True
as the value to the config (within with_config), you should be good. Also, your code won’t work — there are two APIs, and you’re passing in the dataframe as the config. Fixed code coming shortly…
Copy code
dr = driver.Builder().with_modules(modules).with_config({"feature_list" : cat_cols, “hamilton.enable_power_user_mode”:True})
df = dr.execute(output_columns, inputs=df.to_dict())
Note I’ve made two changes: 1. Moved your dataframe to be inputs on execute — it’s an input, not a config. You were passing it in as a configuration variable (using the first parameter to the driver) 2. used the new builder API — a lot easier to work with and more readable! Your code was using both, which don’t work together.
Hope this helps!
k
Sorry I tried your solution but I had this error
AttributeError: 'Builder' object has no attribute 'execute'
e
Ahh sorry, forgot to say to add .build() at the end of the builder call!
👍 1
k
Now I have this error
AttributeError: 'UpstreamDependency' object has no attribute 'items'
e
Ahh, I did the parameterize call wrong
k
Copy code
@resolve(
    when=ResolveAt.CONFIG_AVAILABLE,
    decorate_with=lambda feature_list: parameterize(
    **{f"{feature}_new": source(feature) for feature in feature_list})
)
def cast_to_string_type(feature_list):
    return feature_list.astype(str)
e
Sorry I’ve been on my phone. The with resolve, you can pass that on from config:
Copy code
@resolve(
    when=ResolveAt.CONFIG_AVAILABLE,
    decorate_with=lambda feature_list: parameterize(
    **{f”{feature}_new”: {“feature” :source(feature)} for feature in feature_list})
)
And the function needs to take a parameter
feature
of type series