What order does `what_is_upstream_of` return the n...
# hamilton-help
a
What order does
what_is_upstream_of
return the nodes in? Is there any way to get the list of nodes required to compute a target node in topological order?
👀 1
e
So
what_is_upstream_of
does not currently guarentee order. Re: topological sort, this is a very reasonable request (and adding a guarentee or a parameter makes a lot of sense). There are a few approaches you could consider — one should be pretty simple, writing an example.
This assumes you have a variable called
dr
that is your driver.
Copy code
>>> graph = {n.name: list(n.optional_dependencies) + list(n.required_dependencies) for n in dr.what_is_upstream_of("confusion_matrix")}
>>> from graphlib import TopologicalSorter
>>> top_sorted = list(TopologicalSorter(graph).static_order())
>>> top_sorted
['iris_data', 'test_size_fraction', 'gamma', 'shuffle_train_test_split', 'feature_matrix', 'target', 'target_names', 'prefit_clf', 'train_test_split_func', 'y_train', 'y_test', 'X_test', 'X_train', 'y_test_with_labels', 'fit_clf', 'predicted_output', 'predicted_output_with_labels', 'confusion_matrix']
🙌 1
t
(graphlib is Python >=3.9)
e
Uses stdlib’s graphlib so no other dependencies
t
otherwise, this should work
Copy code
def topological_sort(nodes):
    """Sort the nodes for nice output display"""

    def dfs(node, visited, stack):
        visited.add(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                dfs(neighbor, visited, stack)
        stack.append(node)

    graph = {n.name: set([*n.required_dependencies, *n.optional_dependencies]) for n in nodes}
    visited = set()
    stack = []

    for node in graph:
        if node not in visited:
            dfs(node, visited, stack)

    return stack
a
Awesome, thanks for the quick reply!
e
Yeah! Fun one. If you’re interested, open up an issue and we can stick this in the hamilton library? I think there’s room for execution plan, etc… (We have a json dump feature as well but it’s not ordered)
a
Sure! I saw this code as well, but the reason it didn't work is cuz it exposed the internal
node.Node
instead of
HamiltonNode
. https://github.com/DAGWorks-Inc/hamilton/blob/b2aeedc2999fad7ee260c6e67520b8ba1348ca8a/hamilton/execution/graph_functions.py#L15 Would the intended behaviour be to always return a sorted graph from the lineage operations?
t
Out of curiosity, what are you building with the sorted nodes?
a
Trying to integrate hamilton into some existing code 😅