This message was deleted.
# hamilton-help
s
This message was deleted.
❤️ 1
e
Quick note — we’ve moved docs to hamilton.readthedocs.io :) Awesome question! In the current version Hamilton is more concerned with extracting and transforming, but loading is absolutely going to be in scope shortly — we have a plan and I’ll be building it out soon 🙂 For now, however, the best way to do this is adjacent to the driver.
Copy code
dr = driver.Driver(config, *modules)
df = dr.execute(vars)
save_df(df)
In the future (quick teaser, API not locked in stone), we’re thinking something like this:
Copy code
dr = driver.Driver(config, *modules)
df = dr.materialize(
    SaveToCSV('col1', 'col2', 'col3', path='training_data.csv'), 
    CustomMetricLogger('metric1', 'metric2'))
Which could also be done by defining custom data adapters:
Copy code
# materialize.py

@materialize_to(
    adapter=SaveToCSV,
    path=config('training_data_path') # or hardcoded
)
def training_data(col1: pd.Series, col2: pd.Series, col3: pd.Series) -> pd.DataFrame:
    return pd.DataFrame(col1, col2, col3)

@materialize_to(
    adapter=CustomMetricLogger
)
def metrics(metric1: float, metric2: float) -> Dict[str, float]:
    return {'metric1' : metric1, 'metric2' : metric2}
👍 1
💡 1
Would love your thoughts on the API above! Getting this built is my plan for this upcoming week, so I’ll be iterating through some APIs. The idea is you’d run the first option (in the driver) for more ad-hoc stuff, then translate it to code when you have a second option. The classes would be pluggable so you could define your own loaders as well. The implementation would be such that the
materialize
call in the driver basically adds the nodes (both saving + joining) to the end of the DAG before executing, so you can see it in your visualize output.
s
@Luke thanks for the question! +1 to what Elijah said. We’ve been thinking about it. In short, today there isn’t really a standard “hamiltonian way”. The two paths are you either save it yourself after getting something from Hamilton. Or you write a function to do it and have Hamilton execute it for you (you’d probably want to change to using a DictResult builder in the driver to do this and then request
save_to_s3
as an output) - e.g.:
Copy code
def save_to_s3(col1: pd.Series, col2: pd.Series, ..., s3_client: 'Client', s3_path: str) -> dict:
    """saves df of data to S3"""
    _df = pd.DataFrame({"col1": col1, ... })
    result = s3_client.save_dataframe(s3_path, _df)
    return {"status": result}
👍 1
l
Great stuff, guys! It’ll take some experimentation to see which approach works best for my team. I’ll plan on syncing back up once we have a chance to sandbox this some.
👍 1
Would love your thoughts on the API above! Getting this built is my plan for this upcoming week, so I’ll be iterating through some APIs.
Are both
SaveToCSV
and
CustomMetricLogger
writing to disc here? Is
CustomMetricLogger
appending the specified metrics to ‘training_data.csv’? This is outside of Hamilton’s core concerns, but I’m anticipating how I may use this tentative API to link experiment parameters with experiment artifacts. There are already tools to do this — DVC, MLFlow, Kedro, etc. — so the solution is probably in integration. In my mind, a good API would allow integration with these other tools in a lightweight way without requiring it. How that would actually work without feature creep for Hamilton is nebulous.
1
e
Good qs — so yeah, API still TBD, but the idea is they’d each be separate materializations. E.G.
SaveToCSV
would be provided by hamilton and write to disk.
CustomMetricLogger
would be something you write, but you could imagine an
MLFlowMetricsLogger
that would write to MLFlow. Great point re: feature-creep, we specifically don’t want to be in the business of storing data or processing metrics, only throwing them over the wall to potential partners. I’m thinking these would be extendable classes so you could plug into whatever you want — allowing pretty natural data saving interfaces with the options you mentioned + more. Also, we are planning to build similar adapter technology, so you could load from any of these 🙂 Thoughts?
1
s
In my mind, a good API would allow integration with these other tools in a lightweight way without requiring it.
Yep, we got you 😉 ! For example, we absolutely don’t want python dependency bloat. So if you want an implementation, it’ll be a separate dependency.