This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

01/23/2023, 9:27 AM

This message was deleted.

👀 1

Elijah Ben Izzy

01/23/2023, 4:23 PM

Welcome @Shayenne Moura! Taking a look — going to do the bare minimum repro — agreed on first glance that something looks wonky. What’s

cf

btw? Is it another module?

Elijah Ben Izzy

01/23/2023, 4:27 PM

Just for some more info (also for lurkerrs) -- I think your understanding of this is spot on. 1. Config is available both at “compile” (dag-building) time and “run” (dag execution) time. So that should be available. 2. Inputs are available just at “run” (dag-execution) time So I’m going to recreate with your functions, see if I can reproduce it, then see if it behaves differently if the data is passed in as an input.

🤔 1

Elijah Ben Izzy

01/23/2023, 4:45 PM

OK, so I was not able to reproduce it using a bare-minimum attempt. Any chance you could send a little more code? Ideally would be a fully-contained script that reproduces it, a few more hints might help me to figure it out though… Here’s my attempt, with two more unit tests (not exactly the same, but close enough…): https://github.com/stitchfix/hamilton/pull/282. Its possible its a spelling mistake, although I can’t see anything…

👍 1

Shayenne Moura

01/23/2023, 4:59 PM

Hi, Elijah! Thank you for the attempt.

What’s
cf
btw? Is it another module?

Yes, it is another module. I saw your test code. Try to use the satellite_product variable inside the function and see if it raises the error. Mine is like this:

Copy code

def load_sat_data(satellite_product: str) -> pd.DataFrame:
    data_path = s3.glob(f"{s3_path}/{satellite_product}*.parq")
    return pd.read_parquet(f's3://{data_path[0]}')

Elijah Ben Izzy

01/23/2023, 5:08 PM

Used it inside the function and it doesn’t fix it — doesn’t surprise me. The function isn’t run until after the DAG is compiled. OK, options from here: (1) Mind posting the full stack trace? Want to make sure its pointing to the right point in the code as well… (2) We can hop on a call and you can walk me through your code — we could debug together (3) If you can reproduce it in some code you could send me, I could happily debug

Shayenne Moura

01/23/2023, 5:14 PM

Copy code

Traceback (most recent call last):
  File "stfeast/features/build_teff.py", line 46, in <module>
    df_sat = dr.execute(output_columns)
  File "/home/sm/envtest/lib/python3.8/site-packages/hamilton/driver.py", line 228, in execute
    raise e
  File "/home/sm/envtest/lib/python3.8/site-packages/hamilton/driver.py", line 221, in execute
    outputs = self.raw_execute(final_vars, overrides, display_graph, inputs=inputs)
  File "/home/sm/envtest/lib/python3.8/site-packages/hamilton/driver.py", line 293, in raw_execute
    self.validate_inputs(
  File "/home/sm/envtest/lib/python3.8/site-packages/hamilton/driver.py", line 193, in validate_inputs
    raise ValueError(error_str)
ValueError: 1 errors encountered:
  Error: Required input satellite_product not provided for nodes: ['load_sat_data'].

Shayenne Moura

01/23/2023, 5:15 PM

We can hop on a call, if you want now.

Elijah Ben Izzy

01/23/2023, 5:16 PM

Sure

Elijah Ben Izzy

01/23/2023, 5:17 PM

meet.google.com/zcc-icbf-gxa

Elijah Ben Izzy

01/23/2023, 5:32 PM

Nice work debugging @Shayenne Moura! For future reference — the driver was getting called twice, the second one with the wrong param (so it wasn’t getting the required results). Then we chatted about the pattern of two driver calls for two outputs that have different spines. This is a great way of doing it, but if you’re looking for another approach, you can: 1. Use a DictResult() builder then join it yourself, e.g. what I’m doing here: https://github.com/stitchfix/hamilton/pull/282/files#diff-a4a614c9bc88758962ff69a4056481f9f9ae8930d92e4ce1ad8daab54a452b30R75 2. Make a function in the DAG that joins all the columns you want into a dataframe IMO it depends on how much you want these to be reusable later on. If you want it all to be reusable and accessible in a DAG later, (2) could be nice (although slightly verbose) — you get the joined dataframe and can use it for training data, etc.. in a future node. Otherwise what you’re doing makes perfect sense to get/save the data somewhere.

👍 1

Open in Slack

Previous Next