This message was deleted.
# hamilton-help
s
This message was deleted.
e
So, this is not currently feasible in the exact way you want it, but there are a few ways to do this. I think this is reasonable. A few things to think through: 1. How would you refer to them downstream? Hamilton has a concept of namespaces, but they’re largely used internally. One option is to use a
_
in the node name so you can refer to them as variables, rather than a true namespace. 2.
@subdag
,
@pipe
, etc… all use namespacing (an internal hamilton concept that maps to what you have), but the idea is that it creates “internal” nodes that go into a subdag that get referred to by other nodes in the subdag/set of generated nodes. So, given that you want to be able to share column names and differentiate ergonomically, one option is to define your own
extract_columns
decorator that does exactly what you want. Some pseudocode:
Copy code
def extract_columns_with_prefix(prefix, *cols):
    new_columns = [prefix + "_" + col for col in cols]
    def wrapper(fn):
        @functools.wraps(fn)
        def new_fn(*args, **kwargs):
            df_out = fn(*args, **kwargs)
            df_out.columns = new_columns 
            return df
        return extract_columns(*new_columns)(fn)
    return wrapper
Basically this: 1. defines a function, 2. That returns… another decorator (the
wrapper
fn) 3. That returns a function 4. That renames the columns 5. This is called by the wrapper in (2) This way we rename the columns and extract them with the new names.
To be clear, this is a quick fix for you to test out the API you like, then we can open up an issue and probably add a
prefix
to
extract_columns
. I think we can make namespacing a little cleaner, especially with large #s of functions
One other thing — the
extract_all
won’t work — we have to know at compile time. When people don’t know the set of columns, they tend to move to dataframe-level computations (which could be reasonable for you to do, as well), which bypasses these issues.
l
Thanks for the notes. I’ve built a few* custom implementations, one similar to what you recommended. I’ll take another shot at cleaning up the API, especially since
extract_all
isn’t feasible. Can you expand more on what is feasible with respect to dynamic DAGs? Can the DAGs be dynamic as long as all possible nodes are defined in advance?
e
Great. So yeah, re: dynamic DAGs there are a few layers of dynamism: 1. Compile/construction-time — at this point we can shape the DAG. This happens prior to execution, and has access to the
config
(which it uses to determine the DAG shape) 2. Runtime — at this point the DAG is fixed, except we have a few constructs (E.G. parallelizable) that allow flexible execution. So, we can’t generate nodes at runtime (yet), except in the specific case of parallelizing. I understand correctly, if you have everything possible defined, you can have a subset and just walk those (although its a little more complex if you want to join/refer to them later. If you know what you want in the config, you can use
@resolve
to build something dynamic: https://hamilton.dagworks.io/en/latest/reference/decorators/resolve/. That said, the more dynamic it is, the more we suggest moving up a level (E.G. if you have a bunch of runtime/parameterized columns you want to think in dataframes to make it easier… Hope this helps! I think we could do a better job clarifying our approach towards dynamism + providing justification — we’ve found that the more fixed it is, the easier it is to think through everything/rationalize about the DAG.
l
If you know what you want in the config, you can use
@resolve
Ok, I’ll probably end up with something like this. Thanks!
e
Awesome! Yeah, let us know if you find an interesting pattern. FWIW I generally recommend burying
resolve
in a custom decorator as well, its powerful but a little hard to follow.
s
@Luke once you have something, would love to review it (if possible) please [could be at a high level]. We are trying to think of documentation and example use cases to help people think through things like this. So things grounded in real world situations are valuable here.