Hey I am trying to write a test where I am asserting the pip Hamilton Open Source #hamilton-help

Hey I am trying to write a test where I am asserti...

Jay

05/14/2024, 6:47 PM

Hey I am trying to write a test where I am asserting the pipeline upstream

Copy code

import extractors.pdf as pdf
from hamilton import driver


def test_pdf_extractor_pipeline():
    dr = driver.Builder().with_modules(pdf).build()
    upstream = dr.what_is_upstream_of('read_pdf')

    assert [node.name for node in upstream] == [ 
        'filepath',
        'read_pdf',
        'read_pdf_from_disk'
    ]

but the list coming from

what_is_upstream_of

keeps changing in order for example one time it would say

['read_pdf', 'read_pdf_from_disk', 'filepath']

and another

['read_pdf', 'filepath', 'read_pdf_from_disk']

while I can get the test to pass by forcing an order itself, but I was expecting the function

what_is_upstream_of

to be consistent in what it provides

👀 1

Stefan Krawczyk

05/14/2024, 6:49 PM

Hi @Jay it’s a python thing in how we’re constructing and then walking the graph. I would put that into a

set()

and evaluate on that, so something like:

Copy code

def test_pdf_extractor_pipeline():
    dr = driver.Builder().with_modules(pdf).build()
    upstream = dr.what_is_upstream_of('read_pdf')

    assert set([node.name for node in upstream]) == set([ 
        'filepath',
        'read_pdf',
        'read_pdf_from_disk'
    ])

or alternatively, you sort the lists and compare it that way.

👍 1

Jay

05/14/2024, 6:54 PM

yes, that can be done but I was more concerned about the nodes not being in order since its constructing the information from a directed graph it should ideally never be not in the same order, it could just be a python thing

Elijah Ben Izzy

05/14/2024, 6:56 PM

Chiming in — the order comes from a traversal order, which I would argue isn’t specific enough to imply order in the contract. That said, it’s a fairly complicated API design qutsion whether or not things functions should return order always or allow a more unordered approach.

Elijah Ben Izzy

05/14/2024, 6:58 PM

FWIW in this case its due to the type

set

, so it is a python thing. Sets have never made guarentees about stabiliity (and thus order) — unlike dicts (I think).

Jay

05/14/2024, 6:59 PM

just exploring the code, at least from what I can understand the function

directional_dfs_traverse

https://github.com/DAGWorks-Inc/hamilton/blob/2c1eb9a0c3c7b26162893ee016503bd817b1776a/hamilton/graph.py#L1005 uses set to accumulate

nodes

that is probably why that happens, and @Elijah Ben Izzy that is fair enough it was just an expectation in my head since its some sort of execution graph somewhere, I will use set for now and move ahead. Thanks for getting back so soon 🙂

👍 1

Elijah Ben Izzy

05/14/2024, 6:59 PM

Yep! Definitely a good question 🙂

gratitude thank you 1

Jay

05/14/2024, 7:05 PM

I am guessing the set is used mostly to make sure the node is mentioned only once, for example in scenarios like

Copy code

a -> b ->c
a -> c

where a is connected to both b and c and probably other similar scenarios

Elijah Ben Izzy

05/14/2024, 7:07 PM

Yeah, set has two purposes: 1. Exactly as you said. That said, it could always be converted after uniquifying, that’s a bit of a implementation detail 2. It indicates the type of operations. The order wouldn’t imply much TBH, but what you’ll really want is (1) enumerate (just give all of them), and (2) query (see if something is in it). I think set is a clean way to do both of those?

👍 2

Jay

05/14/2024, 7:08 PM

yes

set

does solve both of the use cases

Jay

05/14/2024, 7:23 PM

I was just trying to use the function for a different use case than what it was meant for 🙂

Elijah Ben Izzy

05/14/2024, 7:31 PM

Meaning testing? I think the set or ordering solution is good there 🙂

👍 1

Open in Slack

Previous Next