This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

10/21/2023, 2:45 PM

This message was deleted.

Stefan Krawczyk

10/21/2023, 3:04 PM

Welcome & great question! Some things to keep in mind: (1) Hamilton’s roots are to help make it easy to figure out given an output, what was the code that created it. E.g. each function is a named piece of business logic. So for some extra verbosity you get much simpler maintenance and hand-off. (2) It’s up to you as to how granular you operate at. E.g. do you want to operate over dataframes, or columns, or both? With that in mind, general building blocks like one-hot-encoding will be used within the body of the functions that need it (versus other systems where it’s part of the pipeline). If there’s any shared logic between functions you can abstract that out like usual with “helper functions” or importing other code modules. My recommendation is to draw out a granular DAG of the operations you want, and then the code in Hamilton should map pretty closely to it. For an example using encoders you can look at our DBT example using the titanic dataset (look at the diagram in the readme and you’ll see how it’s set up). Otherwise in the next release (ETA Monday), we’ve got a new decorator pipe that will enable you to more explicitly put something generic as part of the “pipeline definition”. If you have example code, then we can help ground some of what I said with options.

👍 1

Stefan Krawczyk

10/21/2023, 3:16 PM

and here’s another example using

pd.get_dummies

👍 1

Elijah Ben Izzy

10/21/2023, 3:22 PM

To clarify — yep, both are supported, and as @Stefan Krawczyk said it’s a matter of how you want to model it: 1. One-hot-encoding could produce multiple nodes if you know the space of values (#/names of columns), by outputting the dataframe and doing extract_columns or just output a dataframe and loading later. 2. Many -> one can be a function of a dataframe or one that inputs a series of columns, and outputs a series. Any python objects are valid — so dataframes, columns, etc… the standard pandas results builder can join series, dataframes, and most objects, but you can build your own to do custom stuff (naming columns intelligently, etc…). AFK now but happy to share some pseudo code — @Stefan Krawczyk ‘s example is pretty straightforward though IMO.

👍 1

Gilad Rubin

10/21/2023, 4:00 PM

Perfect, I'll have a look and let you know if I have more questions

👍 2

2 Views

Open in Slack

Previous Next