Hey I am trying to write a test where I am asserti...
# hamilton-help
j
Hey I am trying to write a test where I am asserting the pipeline upstream
Copy code
import extractors.pdf as pdf
from hamilton import driver


def test_pdf_extractor_pipeline():
    dr = driver.Builder().with_modules(pdf).build()
    upstream = dr.what_is_upstream_of('read_pdf')

    assert [node.name for node in upstream] == [ 
        'filepath',
        'read_pdf',
        'read_pdf_from_disk'
    ]
but the list coming from
what_is_upstream_of
keeps changing in order for example one time it would say
['read_pdf', 'read_pdf_from_disk', 'filepath']
and another
['read_pdf', 'filepath', 'read_pdf_from_disk']
while I can get the test to pass by forcing an order itself, but I was expecting the function
what_is_upstream_of
to be consistent in what it provides
👀 1
s
Hi @Jay it’s a python thing in how we’re constructing and then walking the graph. I would put that into a
set()
and evaluate on that, so something like:
Copy code
def test_pdf_extractor_pipeline():
    dr = driver.Builder().with_modules(pdf).build()
    upstream = dr.what_is_upstream_of('read_pdf')

    assert set([node.name for node in upstream]) == set([ 
        'filepath',
        'read_pdf',
        'read_pdf_from_disk'
    ])
or alternatively, you sort the lists and compare it that way.
👍 1
j
yes, that can be done but I was more concerned about the nodes not being in order since its constructing the information from a directed graph it should ideally never be not in the same order, it could just be a python thing
e
Chiming in — the order comes from a traversal order, which I would argue isn’t specific enough to imply order in the contract. That said, it’s a fairly complicated API design qutsion whether or not things functions should return order always or allow a more unordered approach.
FWIW in this case its due to the type
set
, so it is a python thing. Sets have never made guarentees about stabiliity (and thus order) — unlike dicts (I think).
j
just exploring the code, at least from what I can understand the function
directional_dfs_traverse
https://github.com/DAGWorks-Inc/hamilton/blob/2c1eb9a0c3c7b26162893ee016503bd817b1776a/hamilton/graph.py#L1005 uses set to accumulate
nodes
that is probably why that happens, and @Elijah Ben Izzy that is fair enough it was just an expectation in my head since its some sort of execution graph somewhere, I will use set for now and move ahead. Thanks for getting back so soon 🙂
👍 1
e
Yep! Definitely a good question 🙂
gratitude thank you 1
j
I am guessing the set is used mostly to make sure the node is mentioned only once, for example in scenarios like
Copy code
a -> b ->c
a -> c
where a is connected to both b and c and probably other similar scenarios
e
Yeah, set has two purposes: 1. Exactly as you said. That said, it could always be converted after uniquifying, that’s a bit of a implementation detail 2. It indicates the type of operations. The order wouldn’t imply much TBH, but what you’ll really want is (1) enumerate (just give all of them), and (2) query (see if something is in it). I think set is a clean way to do both of those?
👍 2
j
yes
set
does solve both of the use cases
I was just trying to use the function for a different use case than what it was meant for 🙂
e
Meaning testing? I think the set or ordering solution is good there 🙂
👍 1