Slackbot
03/11/2024, 9:06 PMElijah Ben Izzy
03/11/2024, 9:53 PM_
in the node name so you can refer to them as variables, rather than a true namespace.
2. @subdag
, @pipe
, etc… all use namespacing (an internal hamilton concept that maps to what you have), but the idea is that it creates “internal” nodes that go into a subdag that get referred to by other nodes in the subdag/set of generated nodes.
So, given that you want to be able to share column names and differentiate ergonomically, one option is to define your own extract_columns
decorator that does exactly what you want. Some pseudocode:
def extract_columns_with_prefix(prefix, *cols):
new_columns = [prefix + "_" + col for col in cols]
def wrapper(fn):
@functools.wraps(fn)
def new_fn(*args, **kwargs):
df_out = fn(*args, **kwargs)
df_out.columns = new_columns
return df
return extract_columns(*new_columns)(fn)
return wrapper
Basically this:
1. defines a function,
2. That returns… another decorator (the wrapper
fn)
3. That returns a function
4. That renames the columns
5. This is called by the wrapper in (2)
This way we rename the columns and extract them with the new names.Elijah Ben Izzy
03/11/2024, 9:54 PMprefix
to extract_columns
. I think we can make namespacing a little cleaner, especially with large #s of functionsElijah Ben Izzy
03/11/2024, 9:57 PMextract_all
won’t work — we have to know at compile time. When people don’t know the set of columns, they tend to move to dataframe-level computations (which could be reasonable for you to do, as well), which bypasses these issues.Luke
03/11/2024, 10:02 PMextract_all
isn’t feasible. Can you expand more on what is feasible with respect to dynamic DAGs? Can the DAGs be dynamic as long as all possible nodes are defined in advance?Elijah Ben Izzy
03/11/2024, 10:15 PMconfig
(which it uses to determine the DAG shape)
2. Runtime — at this point the DAG is fixed, except we have a few constructs (E.G. parallelizable) that allow flexible execution.
So, we can’t generate nodes at runtime (yet), except in the specific case of parallelizing.
I understand correctly, if you have everything possible defined, you can have a subset and just walk those (although its a little more complex if you want to join/refer to them later. If you know what you want in the config, you can use @resolve
to build something dynamic: https://hamilton.dagworks.io/en/latest/reference/decorators/resolve/.
That said, the more dynamic it is, the more we suggest moving up a level (E.G. if you have a bunch of runtime/parameterized columns you want to think in dataframes to make it easier…
Hope this helps! I think we could do a better job clarifying our approach towards dynamism + providing justification — we’ve found that the more fixed it is, the easier it is to think through everything/rationalize about the DAG.Luke
03/11/2024, 10:19 PMIf you know what you want in the config, you can useOk, I’ll probably end up with something like this. Thanks!@resolve
Elijah Ben Izzy
03/11/2024, 10:24 PMresolve
in a custom decorator as well, its powerful but a little hard to follow.Stefan Krawczyk
03/11/2024, 10:35 PM