This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

03/11/2024, 9:06 PM

This message was deleted.

Elijah Ben Izzy

03/11/2024, 9:53 PM

So, this is not currently feasible in the exact way you want it, but there are a few ways to do this. I think this is reasonable. A few things to think through: 1. How would you refer to them downstream? Hamilton has a concept of namespaces, but they’re largely used internally. One option is to use a

in the node name so you can refer to them as variables, rather than a true namespace. 2.

@subdag

@pipe

, etc… all use namespacing (an internal hamilton concept that maps to what you have), but the idea is that it creates “internal” nodes that go into a subdag that get referred to by other nodes in the subdag/set of generated nodes. So, given that you want to be able to share column names and differentiate ergonomically, one option is to define your own

extract_columns

decorator that does exactly what you want. Some pseudocode:

Copy code

def extract_columns_with_prefix(prefix, *cols):
    new_columns = [prefix + "_" + col for col in cols]
    def wrapper(fn):
        @functools.wraps(fn)
        def new_fn(*args, **kwargs):
            df_out = fn(*args, **kwargs)
            df_out.columns = new_columns 
            return df
        return extract_columns(*new_columns)(fn)
    return wrapper

Basically this: 1. defines a function, 2. That returns… another decorator (the

wrapper

fn) 3. That returns a function 4. That renames the columns 5. This is called by the wrapper in (2) This way we rename the columns and extract them with the new names.

Elijah Ben Izzy

03/11/2024, 9:54 PM

To be clear, this is a quick fix for you to test out the API you like, then we can open up an issue and probably add a

prefix

extract_columns

. I think we can make namespacing a little cleaner, especially with large #s of functions

Elijah Ben Izzy

03/11/2024, 9:57 PM

One other thing — the

extract_all

won’t work — we have to know at compile time. When people don’t know the set of columns, they tend to move to dataframe-level computations (which could be reasonable for you to do, as well), which bypasses these issues.

Luke

03/11/2024, 10:02 PM

Thanks for the notes. I’ve built a few* custom implementations, one similar to what you recommended. I’ll take another shot at cleaning up the API, especially since

extract_all

isn’t feasible. Can you expand more on what is feasible with respect to dynamic DAGs? Can the DAGs be dynamic as long as all possible nodes are defined in advance?

Elijah Ben Izzy

03/11/2024, 10:15 PM

Great. So yeah, re: dynamic DAGs there are a few layers of dynamism: 1. Compile/construction-time — at this point we can shape the DAG. This happens prior to execution, and has access to the

config

(which it uses to determine the DAG shape) 2. Runtime — at this point the DAG is fixed, except we have a few constructs (E.G. parallelizable) that allow flexible execution. So, we can’t generate nodes at runtime (yet), except in the specific case of parallelizing. I understand correctly, if you have everything possible defined, you can have a subset and just walk those (although its a little more complex if you want to join/refer to them later. If you know what you want in the config, you can use

@resolve

to build something dynamic: https://hamilton.dagworks.io/en/latest/reference/decorators/resolve/. That said, the more dynamic it is, the more we suggest moving up a level (E.G. if you have a bunch of runtime/parameterized columns you want to think in dataframes to make it easier… Hope this helps! I think we could do a better job clarifying our approach towards dynamism + providing justification — we’ve found that the more fixed it is, the easier it is to think through everything/rationalize about the DAG.

Luke

03/11/2024, 10:19 PM

If you know what you want in the config, you can use
@resolve

Ok, I’ll probably end up with something like this. Thanks!

Elijah Ben Izzy

03/11/2024, 10:24 PM

Awesome! Yeah, let us know if you find an interesting pattern. FWIW I generally recommend burying

resolve

in a custom decorator as well, its powerful but a little hard to follow.

Stefan Krawczyk

03/11/2024, 10:35 PM

@Luke once you have something, would love to review it (if possible) please [could be at a high level]. We are trying to think of documentation and example use cases to help people think through things like this. So things grounded in real world situations are valuable here.

Open in Slack

Previous Next