Slackbot
03/16/2023, 4:53 PMElijah Ben Izzy
03/16/2023, 4:59 PMenv
in config
• some_config_param
in config
or input
James Marvin
03/16/2023, 4:59 PMget_auth_token
is missingElijah Ben Izzy
03/16/2023, 5:00 PMenv
into the driver as a config.Elijah Ben Izzy
03/16/2023, 5:00 PMJames Marvin
03/16/2023, 5:01 PMconfig = {
"env": "local"
}
Elijah Ben Izzy
03/16/2023, 5:01 PM@config.when(env="deployed", required=True)
to make it break if it doesn’t exist. The way we do this internally is to call out to config.get('env')
, which returns None
if it doesn’t exist (which doesn’t equal anything…).Elijah Ben Izzy
03/16/2023, 5:02 PMElijah Ben Izzy
03/16/2023, 5:02 PMconfig
to the driver (as the first parameter), right?James Marvin
03/16/2023, 5:02 PMJames Marvin
03/16/2023, 5:05 PMconfig = {
"env": "local"
"sa_to_impersonate": "something"
}
dr = driver.Driver(input_df, transforms)
output_columns = [field for field in FeedbackContainer.__fields__]
output_data = dr.execute(inputs=config, final_vars=output_columns)
Transforms using config.when:
@config.when(env="local")
def sentiment_auth_token__local(sentiment_api_url: str, sa_to_impersonate:str) -> str:
return helpers._impersonated_id_token(
endpoint=sentiment_api_url,
sa_to_impersonate=sa_to_impersonate
)
@config.when(env="deployed")
def sentiment_auth_token__deployed(sentiment_api_url: str) -> str:
return helpers._default_creds_id_token(endpoint=sentiment_api_url)
def sentiment_request_headers(sentiment_auth_token: str) -> dict:
return helpers._create_request_headers(
token=sentiment_auth_token
)
The error:
_Error: Required input sentiment_auth_token not provided for nodes: ['sentiment_request_headers']._
James Marvin
03/16/2023, 5:05 PMElijah Ben Izzy
03/16/2023, 5:06 PMElijah Ben Izzy
03/16/2023, 5:07 PMconfig = {
"env": "local"
"sa_to_impersonate": "something"
}
dr = driver.Driver(input_df, transforms)
output_columns = [field for field in FeedbackContainer.__fields__]
output_data = dr.execute(inputs=config, final_vars=output_columns)
The problem is that you pass config
to inputs
. The difference is a little subtle.
I’m not sure how you’re using input_df
, but the right way to call it is:
config = {
"env": "local"
"sa_to_impersonate": "something"
}
dr = driver.Driver(config, transforms)
output_columns = [field for field in FeedbackContainer.__fields__]
# this is if you're using `input_df` as a dict, per-column
output_data = dr.execute(inputs=input_df, final_vars=output_columns)
# if you're referring to it as a field, you might want:
output_data = dr.execute(inputs={'input_df' : input_df}, final_vars=output_columns)
James Marvin
03/16/2023, 5:09 PMElijah Ben Izzy
03/16/2023, 5:11 PMconfig
2. input
config
is used to shape the DAG (e.g. env
). This is used when we determine how to map functions to nodes you can visualize.
inputs
is like runtime-parameters for the DAG. The confusing piece is that anything in config
can be referred to as an input would, but nothing in inputs
can be used in config
. E.G. config.when(param=value)
needs you to put param
in the config dict, whereas:
def foo(param: int) -> int:
...
Allows param
to be passed in either at config time (dr = Driver({'param' : 1}, …)
or at runtime: dr.execute(vars, inputs={'param' : 1})
Elijah Ben Izzy
03/16/2023, 5:12 PMJames Marvin
03/16/2023, 5:21 PMinput_df
were being automatically interpreted as nodes in the dag that I could refer toElijah Ben Izzy
03/16/2023, 5:24 PMinputs = {'some_param' : 'some_value'} # inputs you already have
inputs = {**inputs, **df.to_dict(orient='series')} # inputs from your dataframe
Elijah Ben Izzy
03/16/2023, 5:26 PMJames Marvin
03/16/2023, 5:44 PMinputs
aren't being picked up and I'm getting a ValueError (unknown nodes requested).
I'm following the below pattern:
import transforms
# Contains columns/keys e.g. "A", "B", "C"
my_input = my_df.to_dict(orient="series")
# Additional nodes I want to refer to in the DAG
my_config = {"env":"local", "sa_to_impersonate":"something"}
dr = driver.Driver(config, transforms)
output_columns = ["A", "B", "C", "D"]
output_data = dr.execute(inputs=my_input, final_vars=output_volumns]
But I'm getting an error like:
ValueError: Unknown nodes [A, B, C] requested. Check for typos?
Elijah Ben Izzy
03/16/2023, 6:59 PMextract_columns
and (2) rename the columns to be extracted to _raw
and transform them or just rename a few of them.
@James Marvin feel free to reach out if you have any more issues! One thing that would help is: https://github.com/DAGWorks-Inc/hamilton/issues/65 — you’d be able to rename columns.James Marvin
03/16/2023, 8:07 PM