Afternoon Is it clear why `saved formatted data output` is t Hamilton Open Source #hamilton-help

Afternoon, Is it clear why `saved_formatted_data_o...

Seth Stokes

03/27/2024, 7:41 PM

Afternoon, Is it clear why

saved_formatted_data_output

is typed as a

dict

Copy code

@save_to.excel(
    path=source("path_to_save"), 
    output_name_="saved_formatted_data_output", 
    index=False
)
@config.when(data_product="gui")
@schema.output(
    ("Attribute_A", "str"),
    ("Attribute_B", "str"),
    ("Attribute_C", "str"),
)
def formatted_data_output__gui(data: pd.DataFrame) -> pd.DataFrame: 
    ...

Copy code

hamilton.function_modifiers.base.InvalidDecoratorException: Node saved_formatted_data_output has type typing.Dict[str, typing.Any] which is not a registered type for a dataset. Registered types are {'pandas': <class 'pandas.core.frame.DataFrame'>, 'polars': <class 'polars.dataframe.frame.DataFrame'>}. If you found this, either (a) ensure you have the right package installed, or (b) reach out to the team to figure out how to add yours.

Thierry Jean

03/27/2024, 7:48 PM

will investigate! I know

pd.DataFrame

is actually just a dictionary of columns. I suspect

@schema

might interact with the returned value of

formatted_data_output

Seth Stokes

03/27/2024, 7:51 PM

Thank you. Yes removing

@schema

got passed the issue

👍 1

Thierry Jean

03/27/2024, 7:51 PM

if you remove

@schema

does it work?

Thierry Jean

03/27/2024, 7:51 PM

great, will work to fix the bug

❤️ 1

Thierry Jean

03/27/2024, 8:01 PM

in my reproduction, setting

target_

as follow also fixes the issue

Copy code

@schema.output(
    ("Attribute_A", "str"),
    ("Attribute_B", "str"),
    ("Attribute_C", "str"),
    target_="formatted_data_output",
)

Can you try it on your end?

Seth Stokes

03/27/2024, 8:25 PM

yeah that fixes it, thank you

Elijah Ben Izzy

03/27/2024, 8:27 PM

Yep, target is correct, this makes sense. Nice find @Thierry Jean! So this happens cause it doesn’t know which node to decorate, save_to creates a node, ao schema needs to know which one to decorate

Thierry Jean

03/27/2024, 8:30 PM

the docs for schema.output says that it should defer to the decorated node if nothing is specified. I tracked down the issue to

hamilton.functions.base

function

resolve_nodes()

. More precisely, the issue is that it keeps track of a list of nodes associated with the function

formatted_data_output

and in this case, the materializer is pushed to position 0 in the list. Therefore, the schema actually receives the materializer output, a metadata dictionary, instead of the dataframe

Thierry Jean

03/27/2024, 8:31 PM

@Elijah Ben Izzy should be able to fix this nicely when he has a minute 😅

Thierry Jean

03/27/2024, 8:34 PM

for now, setting the

target_

should be a robust solution!

👍 1

Elijah Ben Izzy

03/27/2024, 8:58 PM

Yeah, so this behavior is intended (although confusing) — think we can at least have better documentation here…

Open in Slack

Previous Next