This message was deleted.
# hamilton-help
s
This message was deleted.
👍 1
e
Hey @Dries Hugaerts! So yes, you can list all nodes with
driver.list_available_variables()
(https://hamilton.dagworks.io/en/latest/reference/drivers/Driver/#hamilton.driver.Driver.list_available_variables). This + tags enables you to list your varaibles then query specifically for them/organize them in the way you want. That said, I’m not entirely sure what you mean by nodes that “aren’t all executable”. Do you have some psuedocode that draws out what you’re doing?
d
Hey @Elijah Ben Izzy, Thanks for the quick response, maybe we have been a bit “cheeky” but we have created a lot of nodes that we know will not execute because the dependencies do not exist. We dit this because the module is much better readable that way. I can give you some pseudo-code example: Given following module
functions.py
Copy code
foos = ['a', 'b', 'c']
bars = ['d', 'e', 'f']

@parameterize_sources(
    **{
        f'{foo}_{bar}_sum': dict(
            column_1=f'{foo}_{bar}_column',
            column_2=f'{bar}_{foo}_column'
        )
        for foo in foos for bar in bars if foo != bar
    }
)
def sum_columns(column_1: pd.Series, column_2: pd.Series) -> pd.Series:
    """
    Computes the sum of two columns
    """
    return column_1 + column_2
And say we have following code:
Copy code
df = pd.DataFrame([[1, 2, 3, 4], [3, 4, 5, 6]], columns=['a_d_column', 'd_a_column', 'b_e_column', 'e_b_column'])

module = importlib.import_module('functions')
driver = Driver(df, module)
Then the graph would have 9 nodes, but only 2 executable:
a_d_sum
and
b_e_sum
When I execute
driver.list_available_variables()
I get
Copy code
['a_d_sum',
 'a_e_sum',
 'a_f_sum',
 'b_d_sum',
 'b_e_sum',
 'b_f_sum',
 'c_d_sum',
 'c_e_sum',
 'c_f_sum']
What we are looking for:
Copy code
driver.list_executable_variables()
>['a_d_sum',
  'b_e_sum']
e
Ahh, I see, so the questions are whether the inputs for your node exist in inputs. If it does, then its “executable” Makes sense. I think you can do this with some basic pre-processing, but let me write a bit of code to prove it out.
OK, great, thanks for your patience! This is all doable with a few of the new features @Stefan Krawczyk added for lineage. See this gist — it has the function you want using standard driver functions. Can add to the standard hamilton library or leave as a recipe in the docs (TBD): https://gist.github.com/elijahbenizzy/7ddd44af73d1cbff5eb65d5e01f71bb8
d
Hello Elijah, Wonderful! It is working fine for most of the nodes. However we encountered an issue for some of our especially cheeky definitions. Say we defined also:
Copy code
@parameterize_sources(
    **{
        f'{foo}_{bar}_sum': dict(
            column=f'{bar}_{foo}_sum',
        )
        for foo in foos for bar in foos if foo != bar
    }
)
def minus(column: pd.Series) -> pd.Series:
    """
    Computes the inverse
    """
    return -column
Then these will also turn up in the list (e.g.
a_b_sum, b_a_sum, ...
), however if you now try to execute this you will hit the recursion limit as
"foo_bar_sum" -> "bar_foo_sum" -> "foo_bar_sum" -> …
We are able to work around this by temporarily disabling these definitions, which works as desired. Many thanks!
e
Will look in just a minute!
Ok, back at the keyboard — slightly confused. How do you run it? I think the DAG will register a cycle if you try to run and include both, no? Otherwise, I think you can actually do this in a clever way, you want to break it into two disjoint sets 🙂 Haven’t verified, but this might work… 1. List every variable — split by underscore 2. Sort every variable by the first item in the split 3. Split that list into two halves, (a), and (b) 4. Calculate upstream for (a) using the code in the gist, as well as upstream for (b) 5. combine
You may have to add “overrides” for (4), although I’d have to mess with it