Afternoon, Is it clear why `saved_formatted_data_o...
# hamilton-help
s
Afternoon, Is it clear why
saved_formatted_data_output
is typed as a
dict
?
Copy code
@save_to.excel(
    path=source("path_to_save"), 
    output_name_="saved_formatted_data_output", 
    index=False
)
@config.when(data_product="gui")
@schema.output(
    ("Attribute_A", "str"),
    ("Attribute_B", "str"),
    ("Attribute_C", "str"),
)
def formatted_data_output__gui(data: pd.DataFrame) -> pd.DataFrame: 
    ...
Copy code
hamilton.function_modifiers.base.InvalidDecoratorException: Node saved_formatted_data_output has type typing.Dict[str, typing.Any] which is not a registered type for a dataset. Registered types are {'pandas': <class 'pandas.core.frame.DataFrame'>, 'polars': <class 'polars.dataframe.frame.DataFrame'>}. If you found this, either (a) ensure you have the right package installed, or (b) reach out to the team to figure out how to add yours.
t
will investigate! I know
pd.DataFrame
is actually just a dictionary of columns. I suspect
@schema
might interact with the returned value of
formatted_data_output
s
Thank you. Yes removing
@schema
got passed the issue
👍 1
t
if you remove
@schema
does it work?
great, will work to fix the bug
❤️ 1
in my reproduction, setting
target_
as follow also fixes the issue
Copy code
@schema.output(
    ("Attribute_A", "str"),
    ("Attribute_B", "str"),
    ("Attribute_C", "str"),
    target_="formatted_data_output",
)
Can you try it on your end?
s
yeah that fixes it, thank you
e
Yep, target is correct, this makes sense. Nice find @Thierry Jean! So this happens cause it doesn’t know which node to decorate, save_to creates a node, ao schema needs to know which one to decorate
t
the docs for schema.output says that it should defer to the decorated node if nothing is specified. I tracked down the issue to
hamilton.functions.base
function
resolve_nodes()
. More precisely, the issue is that it keeps track of a list of nodes associated with the function
formatted_data_output
and in this case, the materializer is pushed to position 0 in the list. Therefore, the schema actually receives the materializer output, a metadata dictionary, instead of the dataframe
@Elijah Ben Izzy should be able to fix this nicely when he has a minute 😅
for now, setting the
target_
should be a robust solution!
👍 1
e
Yeah, so this behavior is intended (although confusing) — think we can at least have better documentation here…