Jay
05/14/2024, 6:47 PMimport extractors.pdf as pdf
from hamilton import driver
def test_pdf_extractor_pipeline():
dr = driver.Builder().with_modules(pdf).build()
upstream = dr.what_is_upstream_of('read_pdf')
assert [node.name for node in upstream] == [
'filepath',
'read_pdf',
'read_pdf_from_disk'
]
but the list coming from what_is_upstream_of
keeps changing in order for example one time it would say ['read_pdf', 'read_pdf_from_disk', 'filepath']
and another ['read_pdf', 'filepath', 'read_pdf_from_disk']
while I can get the test to pass by forcing an order itself, but I was expecting the function what_is_upstream_of
to be consistent in what it providesStefan Krawczyk
05/14/2024, 6:49 PMset()
and evaluate on that, so something like:
def test_pdf_extractor_pipeline():
dr = driver.Builder().with_modules(pdf).build()
upstream = dr.what_is_upstream_of('read_pdf')
assert set([node.name for node in upstream]) == set([
'filepath',
'read_pdf',
'read_pdf_from_disk'
])
or alternatively, you sort the lists and compare it that way.Jay
05/14/2024, 6:54 PMElijah Ben Izzy
05/14/2024, 6:56 PMElijah Ben Izzy
05/14/2024, 6:58 PMset
, so it is a python thing. Sets have never made guarentees about stabiliity (and thus order) — unlike dicts (I think).Jay
05/14/2024, 6:59 PMdirectional_dfs_traverse
https://github.com/DAGWorks-Inc/hamilton/blob/2c1eb9a0c3c7b26162893ee016503bd817b1776a/hamilton/graph.py#L1005
uses set to accumulate nodes
that is probably why that happens, and @Elijah Ben Izzy that is fair enough it was just an expectation in my head since its some sort of execution graph somewhere, I will use set for now and move ahead.
Thanks for getting back so soon 🙂Elijah Ben Izzy
05/14/2024, 6:59 PMJay
05/14/2024, 7:05 PMa -> b ->c
a -> c
where a is connected to both b and c and probably other similar scenariosElijah Ben Izzy
05/14/2024, 7:07 PMJay
05/14/2024, 7:08 PMset
does solve both of the use casesJay
05/14/2024, 7:23 PMElijah Ben Izzy
05/14/2024, 7:31 PM