This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

10/04/2023, 11:45 AM

This message was deleted.

Elijah Ben Izzy

10/04/2023, 2:21 PM

Hey! Not at my computer now, but I can give you some sample code snippets later. So it depends if you want them each to be their own node. Options: 1. Wrap it in a single dataframe — have that take in the list, and pass it in as inputs at runtime. This is good if they vary on a per-run basis, or if you want them to be specified in an upstream function. 2. Use

@resolve

— this is a power user feature that enables you to pass it in as config (e.g. as the first argument in the driver), and then create the decorator at compile time — when the dag is being constructed. https://hamilton.dagworks.io/en/latest/reference/decorators/resolve/ The thing I’m curious about is whether having a config file is the cleanest way — hamilton pushes users to effectively merge the config + the code, providing an easy way to build a readable dataflow. Configs allow you to change things quickly, but we’ve found that in most cases it’s not strictly necessary — structure in code provides an easy way to track what’s happening and understand everything you need from the code itself. What I would do is first try (1) — in fact, extract_columns might be the nicest. You have a single dataframe, apply the transforms to all in your list, then product one series per categorical feature. Then, you can use the list of columns to extract, and if you really need to, pass it in from config using

@resolve

. Happy to provide some outlines a little later when I’m back at my desk!

Stefan Krawczyk

10/04/2023, 11:41 PM

@Khalil Mlayhi let us know if this helped get you unstuck — happy to share some code if you need it. But otherwise the design decision is what do you want to have in code — and what should be configuration.

Khalil Mlayhi

10/05/2023, 8:50 AM

@Elijah Ben Izzy Thank you for your suggestions. I think it can work but it does not fit my design. I just want to avoid some code redundancy by using a list of columns as input instead of the @parameterize format. Also, I want to use a config file so if I want to add a column to the list I add it there instead of searching where to add it in the code. Nevertheless, your reply was helpful. 😄 @Stefan Krawczyk Thanks for offering to help more. I appreciate it 😊

Elijah Ben Izzy

10/05/2023, 1:59 PM

So yep, if you’re dead set on config, then resolve can absolutely do it. A common approach is to make the list a constant, then you can modify that! Then parameterize would have ** and a dictionary comprehension

Elijah Ben Izzy

10/05/2023, 2:08 PM

Copy code

@parameterize(
    **{f”{feature}_new”: source(feature) for feature in FEATURE_LIST}
)
....

👍 1

Elijah Ben Izzy

10/05/2023, 2:14 PM

The with resolve, you can pass that on from config:

Copy code

@resolve(
    when=ResolveAt.CONFIG_AVAILABLE,
    decorate_with=lambda feature_list: parameterize(
    **{f”{feature}_new”: source(feature) for feature in feature_list})
)

Copy code

dr = driver.Builder().with_modules(my_module).with_config({“feature_list” : load_from_config(…)}).build()

Elijah Ben Izzy

10/05/2023, 2:15 PM

Then the shape of the dag is built off of config! Note I left out something — the config has to enable power user mode — see the docs for resolve, but the error will be self-explanatory :)

Khalil Mlayhi

10/05/2023, 3:22 PM

Thanks for the help. How can I enable power_user_mode, please? This is my code:

Copy code

dr = driver.Driver(df, modules, adapter=None).with_config({"feature_list" : cat_cols})
df = dr.execute(output_columns)

The cat_cols is a variable that I create from loading a JSON config file as a dictionary in my code. So I am not using the config file directly with Hamilton. It is a general config for different modules.

Elijah Ben Izzy

10/05/2023, 3:26 PM

Yep, so if you pass in

hamilton.enable_power_user_mode

as the key and the Boolean

True

as the value to the config (within with_config), you should be good. Also, your code won’t work — there are two APIs, and you’re passing in the dataframe as the config. Fixed code coming shortly…

Elijah Ben Izzy

10/05/2023, 3:28 PM

Copy code

dr = driver.Builder().with_modules(modules).with_config({"feature_list" : cat_cols, “hamilton.enable_power_user_mode”:True})
df = dr.execute(output_columns, inputs=df.to_dict())

Elijah Ben Izzy

10/05/2023, 3:29 PM

Note I’ve made two changes: 1. Moved your dataframe to be inputs on execute — it’s an input, not a config. You were passing it in as a configuration variable (using the first parameter to the driver) 2. used the new builder API — a lot easier to work with and more readable! Your code was using both, which don’t work together.

Elijah Ben Izzy

10/05/2023, 3:30 PM

Hope this helps!

Khalil Mlayhi

10/05/2023, 3:38 PM

Sorry I tried your solution but I had this error

AttributeError: 'Builder' object has no attribute 'execute'

Elijah Ben Izzy

10/05/2023, 3:38 PM

Ahh sorry, forgot to say to add .build() at the end of the builder call!

👍 1

Khalil Mlayhi

10/05/2023, 3:41 PM

Now I have this error

AttributeError: 'UpstreamDependency' object has no attribute 'items'

Elijah Ben Izzy

10/05/2023, 3:41 PM

Ahh, I did the parameterize call wrong

Khalil Mlayhi

10/05/2023, 3:42 PM

Copy code

@resolve(
    when=ResolveAt.CONFIG_AVAILABLE,
    decorate_with=lambda feature_list: parameterize(
    **{f"{feature}_new": source(feature) for feature in feature_list})
)
def cast_to_string_type(feature_list):
    return feature_list.astype(str)

Elijah Ben Izzy

10/05/2023, 3:43 PM

Sorry I’ve been on my phone. The with resolve, you can pass that on from config:

Copy code

@resolve(
    when=ResolveAt.CONFIG_AVAILABLE,
    decorate_with=lambda feature_list: parameterize(
    **{f”{feature}_new”: {“feature” :source(feature)} for feature in feature_list})
)

Elijah Ben Izzy

10/05/2023, 3:43 PM

And the function needs to take a parameter

feature

of type series

Open in Slack

Previous Next