This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

07/13/2023, 1:22 PM

This message was deleted.

👍 1

Elijah Ben Izzy

07/13/2023, 1:34 PM

Hey @Dries Hugaerts! So yes, you can list all nodes with

driver.list_available_variables()

(https://hamilton.dagworks.io/en/latest/reference/drivers/Driver/#hamilton.driver.Driver.list_available_variables). This + tags enables you to list your varaibles then query specifically for them/organize them in the way you want. That said, I’m not entirely sure what you mean by nodes that “aren’t all executable”. Do you have some psuedocode that draws out what you’re doing?

Dries Hugaerts

07/13/2023, 1:54 PM

Hey @Elijah Ben Izzy, Thanks for the quick response, maybe we have been a bit “cheeky” but we have created a lot of nodes that we know will not execute because the dependencies do not exist. We dit this because the module is much better readable that way. I can give you some pseudo-code example: Given following module

functions.py

Copy code

foos = ['a', 'b', 'c']
bars = ['d', 'e', 'f']

@parameterize_sources(
    **{
        f'{foo}_{bar}_sum': dict(
            column_1=f'{foo}_{bar}_column',
            column_2=f'{bar}_{foo}_column'
        )
        for foo in foos for bar in bars if foo != bar
    }
)
def sum_columns(column_1: pd.Series, column_2: pd.Series) -> pd.Series:
    """
    Computes the sum of two columns
    """
    return column_1 + column_2

And say we have following code:

Copy code

df = pd.DataFrame([[1, 2, 3, 4], [3, 4, 5, 6]], columns=['a_d_column', 'd_a_column', 'b_e_column', 'e_b_column'])

module = importlib.import_module('functions')
driver = Driver(df, module)

Then the graph would have 9 nodes, but only 2 executable:

a_d_sum

and

b_e_sum

When I execute

driver.list_available_variables()

I get

Copy code

['a_d_sum',
 'a_e_sum',
 'a_f_sum',
 'b_d_sum',
 'b_e_sum',
 'b_f_sum',
 'c_d_sum',
 'c_e_sum',
 'c_f_sum']

What we are looking for:

Copy code

driver.list_executable_variables()
>['a_d_sum',
  'b_e_sum']

Elijah Ben Izzy

07/13/2023, 1:57 PM

Ahh, I see, so the questions are whether the inputs for your node exist in inputs. If it does, then its “executable” Makes sense. I think you can do this with some basic pre-processing, but let me write a bit of code to prove it out.

Elijah Ben Izzy

07/13/2023, 2:22 PM

OK, great, thanks for your patience! This is all doable with a few of the new features @Stefan Krawczyk added for lineage. See this gist — it has the function you want using standard driver functions. Can add to the standard hamilton library or leave as a recipe in the docs (TBD): https://gist.github.com/elijahbenizzy/7ddd44af73d1cbff5eb65d5e01f71bb8

Dries Hugaerts

07/13/2023, 2:59 PM

Hello Elijah, Wonderful! It is working fine for most of the nodes. However we encountered an issue for some of our especially cheeky definitions. Say we defined also:

Copy code

@parameterize_sources(
    **{
        f'{foo}_{bar}_sum': dict(
            column=f'{bar}_{foo}_sum',
        )
        for foo in foos for bar in foos if foo != bar
    }
)
def minus(column: pd.Series) -> pd.Series:
    """
    Computes the inverse
    """
    return -column

Then these will also turn up in the list (e.g.

a_b_sum, b_a_sum, ...

), however if you now try to execute this you will hit the recursion limit as

"foo_bar_sum" -> "bar_foo_sum" -> "foo_bar_sum" -> …

We are able to work around this by temporarily disabling these definitions, which works as desired. Many thanks!

Elijah Ben Izzy

07/13/2023, 3:00 PM

Will look in just a minute!

Elijah Ben Izzy

07/13/2023, 3:46 PM

Ok, back at the keyboard — slightly confused. How do you run it? I think the DAG will register a cycle if you try to run and include both, no? Otherwise, I think you can actually do this in a clever way, you want to break it into two disjoint sets 🙂 Haven’t verified, but this might work… 1. List every variable — split by underscore 2. Sort every variable by the first item in the split 3. Split that list into two halves, (a), and (b) 4. Calculate upstream for (a) using the code in the gist, as well as upstream for (b) 5. combine

Elijah Ben Izzy

07/13/2023, 3:53 PM

You may have to add “overrides” for (4), although I’d have to mess with it

Open in Slack

Previous Next