This message was deleted.
# hamilton-help
s
This message was deleted.
r
Example of how I'd want to do it if
extract_fields
could handle DFs:
Copy code
# data_source.py

class DataSource(BaseModel):
    customer_id: str
    some_int: int
    another_int: int

    @classmethod
    def all_fields_with_types(cls) -> dict[str, Any]:
        # get fields and their types from model (pydantic, sqlalchemy, etc)
        # essentially: 
        return typing.get_type_hints(cls)

def extract_source_fields(source: Type[DataSource], exclude: Optional[list[str]] = None) -> Callable:
    source_fields = source.all_fields_with_types()
    included_fields = {field: Series[field_type] for field, field_type in source_fields.items() if field not in (exclude or [])}

    def decorator(func: Callable) -> Callable:
        parent_tag = tag(target_=func.__name__, node_type="data_source")
        return parent_tag(tag(node_type="source_feature")(extract_fields(fields=included_fields)(func)))

    return decorator

# data_loaders.py

@extract_source_fields(DataSource)
def extract_user() -> pd.DataFrame:
    # build & return df
    return pd.DataFrame(...)

# features.py
def add_ints(some_int: Series[int], another_int: Series[int]) -> Series[int]:
    return some_int + another_int
s
Oh interesting! Yep, we could extend
extract_fields
to anything “dict” like I think (thought that might be easier said than done) Or yeah introduce a new decorator… hmm. Or we could pull these extra annotations from the dataframe annotation? 🤔
e
Hmm this could be part of this — been looking for an excuse to dig in for a while: https://github.com/DAGWorks-Inc/hamilton/issues/121
s
@Ryan Whitten I think we could have a pandas specific implementation and put it under
plugins/h_pandas.py
, if we can make it generic for all dataframes, then we could have it in general. Otherwise we’ve talked about using Pandera/Pydantic like you mention (and maybe automatically doing what
@check_output
does) - we just need to figure out the UI/UX for providing the schema. Either on the return type of the function that outputs the dataframe, or via the decorator. Would you be up for prototyping something, at least on the UI/UX you want (the code here looks like pydantic? not pandera?), and we can figure out where to make it live?
r
Cool, thanks guys. I will give it some more thought and happy to provide some prototype for the UX. I do like the idea of using the return type to automatically figure things out and will play around with that. I was able to slightly modify my local copy of hamilton to get
extract_fields
to work with a dataframe. I just had to update the
validate
call to also allow dataframes, and then modify the final `Node`s being built in
transform_node
to set
input_types={node_.name: pd.DataFrame},
instead of
input_types={node_.name: dict},
. Seems like a fairly small change even if that's all that was expanded without a change to the interface
👍 1