This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

03/16/2023, 4:53 PM

This message was deleted.

Elijah Ben Izzy

03/16/2023, 4:59 PM

Hey! So which nodes does it say you’re missing? Based on that DAG, you should be passing in the following so it can run: •

env

config

•

some_config_param

config

input

James Marvin

03/16/2023, 4:59 PM

It's saying that

get_auth_token

is missing

Elijah Ben Izzy

03/16/2023, 5:00 PM

OK, so my guess is that you’re not passsing

env

into the driver as a config.

🫣 1

Elijah Ben Izzy

03/16/2023, 5:00 PM

If it doesn’t exist, it’ll not resolve to anything (this is a weird implementation detail of how we do it)

James Marvin

03/16/2023, 5:01 PM

Ah so I have checked - I have "env" in my config (like below)

Copy code

config = {
    "env": "local"
}

Elijah Ben Izzy

03/16/2023, 5:01 PM

Now that I’m thinking about it, it might be nice to have something like:

@config.when(env="deployed", required=True)

to make it break if it doesn’t exist. The way we do this internally is to call out to

config.get('env')

, which returns

None

if it doesn’t exist (which doesn’t equal anything…).

Elijah Ben Izzy

03/16/2023, 5:02 PM

Hmm… Going to run it locally and see if I can repro

Elijah Ben Izzy

03/16/2023, 5:02 PM

Just to check, you’re passing in the

config

to the driver (as the first parameter), right?

James Marvin

03/16/2023, 5:02 PM

Indeed. Let me provide some specific/exact code snippets

🙌 1

James Marvin

03/16/2023, 5:05 PM

Calling the dag:

Copy code

config = {
    "env": "local"
    "sa_to_impersonate": "something"
}

dr = driver.Driver(input_df, transforms)
output_columns = [field for field in FeedbackContainer.__fields__]
output_data = dr.execute(inputs=config, final_vars=output_columns)

Transforms using config.when:

Copy code

@config.when(env="local")
def sentiment_auth_token__local(sentiment_api_url: str, sa_to_impersonate:str) -> str:
    return helpers._impersonated_id_token(
        endpoint=sentiment_api_url, 
        sa_to_impersonate=sa_to_impersonate
    )


@config.when(env="deployed")
def sentiment_auth_token__deployed(sentiment_api_url: str) -> str: 
    return helpers._default_creds_id_token(endpoint=sentiment_api_url)


def sentiment_request_headers(sentiment_auth_token: str) -> dict:
    return helpers._create_request_headers(
        token=sentiment_auth_token
    )

The error:

_Error: Required input sentiment_auth_token not provided for nodes: ['sentiment_request_headers']._

James Marvin

03/16/2023, 5:05 PM

The return vals aren't series, if that's relevant

Elijah Ben Izzy

03/16/2023, 5:06 PM

Oh, I see what’s going on

Elijah Ben Izzy

03/16/2023, 5:07 PM

Copy code

config = {
    "env": "local"
    "sa_to_impersonate": "something"
}

dr = driver.Driver(input_df, transforms)
output_columns = [field for field in FeedbackContainer.__fields__]
output_data = dr.execute(inputs=config, final_vars=output_columns)

The problem is that you pass

config

inputs

. The difference is a little subtle. I’m not sure how you’re using

input_df

, but the right way to call it is:

Copy code

config = {
    "env": "local"
    "sa_to_impersonate": "something"
}

dr = driver.Driver(config, transforms)
output_columns = [field for field in FeedbackContainer.__fields__]
# this is if you're using `input_df` as a dict, per-column
output_data = dr.execute(inputs=input_df, final_vars=output_columns)
# if you're referring to it as a field, you might want:
output_data = dr.execute(inputs={'input_df' : input_df}, final_vars=output_columns)

James Marvin

03/16/2023, 5:09 PM

Ah!

Elijah Ben Izzy

03/16/2023, 5:11 PM

So this is a pretty subtle/confusing aspect of Hamilton, but we have two (actually three, but the third is not relevant for now) different concepts of data coming in: 1.

config

input

config

is used to shape the DAG (e.g.

env

). This is used when we determine how to map functions to nodes you can visualize.

inputs

is like runtime-parameters for the DAG. The confusing piece is that anything in

config

can be referred to as an input would, but nothing in

inputs

can be used in

config

. E.G.

config.when(param=value)

needs you to put

param

in the config dict, whereas:

Copy code

def foo(param: int) -> int:
    ...

Allows

param

to be passed in either at config time (

dr = Driver({'param' : 1}, …)

or at runtime:

dr.execute(vars, inputs={'param' : 1})

🤯 1

Elijah Ben Izzy

03/16/2023, 5:12 PM

Some docs here https://hamilton.readthedocs.io/en/latest/concepts/driver-capabilities.html#parameterizing-the-dag, but if you have thoughts about how to make this clearer/explain it better I’d love that, cause I think this confuses everyone who uses hamilton at some point 😆

James Marvin

03/16/2023, 5:21 PM

So with the inverse (wrong?) orientation of config/inputs I had previously, the columns in the

input_df

were being automatically interpreted as nodes in the dag that I could refer to

Elijah Ben Izzy

03/16/2023, 5:24 PM

Yep — I think you should be able to do that still (if its the only item in the input), but I think you’d want to do something like:

Copy code

inputs = {'some_param' : 'some_value'} # inputs you already have
inputs = {**inputs, **df.to_dict(orient='series')} # inputs from your dataframe

Elijah Ben Izzy

03/16/2023, 5:26 PM

what’s happening is that pandas dataframes are duck-typeable as a dict, meaning that Hamilton doesn’t actually know the difference (they have most of the same methods).

James Marvin

03/16/2023, 5:44 PM

Thanks for the help! I'm now unfortunately encountering another issue whereby some nodes present as

inputs

aren't being picked up and I'm getting a ValueError (unknown nodes requested). I'm following the below pattern:

Copy code

import transforms

# Contains columns/keys e.g. "A", "B", "C"
my_input = my_df.to_dict(orient="series")

# Additional nodes I want to refer to in the DAG
my_config = {"env":"local", "sa_to_impersonate":"something"}

dr = driver.Driver(config, transforms)
output_columns = ["A", "B", "C", "D"]

output_data = dr.execute(inputs=my_input, final_vars=output_volumns]

But I'm getting an error like:

ValueError: Unknown nodes [A, B, C] requested. Check for typos?

Elijah Ben Izzy

03/16/2023, 6:59 PM

Following up — @James Marvin and I chatted a bit offline and figured it out. This was due to eh face that we were passing through some nodes and transforming some. There also appears to be a bug in which input nodes (user-defined) can’t be used in the output, but configs can. Will be opening up an issue and coming up with a repro soon. The temporary solution was to (1) declare the dataframe as an input and add an

extract_columns

and (2) rename the columns to be extracted to

_raw

and transform them or just rename a few of them. @James Marvin feel free to reach out if you have any more issues! One thing that would help is: https://github.com/DAGWorks-Inc/hamilton/issues/65 — you’d be able to rename columns.

James Marvin

03/16/2023, 8:07 PM

Thanks Elijah - much appreciated

🫡 1

Open in Slack

Previous Next