This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

06/15/2023, 10:06 AM

This message was deleted.

💡 2

Elijah Ben Izzy

06/15/2023, 2:38 PM

Hey! We don’t have an example on the repo, but adding one is on our shortlist. In the meanwhile I’ll write out some pseudo code a little later today to demonstrate how it would work — it’s pretty straightforward.

Conor Molloy

06/15/2023, 3:32 PM

Thanks very much Elijah!

Stefan Krawczyk

06/15/2023, 4:25 PM

@Conor Molloy just to ask — are you looking at it for data engineering? feature engineering? or?

Stefan Krawczyk

06/15/2023, 4:25 PM

(I’m fishing for context to put into our eventual example)

Conor Molloy

06/15/2023, 4:30 PM

This is more from a data engineering perspective. As we are already implement Hamilton for our features but just want to centrally locate our ETL's in prefect.

👍 1

Elijah Ben Izzy

06/15/2023, 4:38 PM

OK, so there are a few ways to integrate. The most obvious/simple is to run the hamilton driver within your prefect workflow/task. This would be as simple as calling out to the driver from within prefect:

Copy code

from hamilton import driver
from my_code import my_module_1, my_module_2

@task()
def do_some_data_processing(some_input: ...):
    dr = driver.Driver({}, my_module_1, my_module_2)
    return dr.execute([...])

And it’s as simple as that. The value add of Hamilton here is that you’re using it for “micro” orchestration. Hamilton is not going to handle execution/retries — rather it’ll organize/run code for you. Prefect isn’t going to tell you fine-grained data lineage, allow you to configure outputs, etc… Now, let’s say you have a pretty big set of hamilton functions that all work together, and you want to break it up into a set of prefect tasks. A common use-case is some expensive operation (external joins), or model training, where you want to store intermediate results. This would look the same as the above, except you’d rely on breaking it into modules more:

Copy code

from hamilton import driver
from my_code import first_feature_set, second_feature_set, process_all_features

@task()
def feature_calcs_1():
    dr = driver.Driver({}, first_feature_set)
    return dr.execute([...]) 

@task()
def feature_calcs_2():
    dr = driver.Driver({}, second_feature_set)
    return dr.execute([...]) 

@task()
def join_and_process(calcs_1: ..., calcs_2: ...):
    dr = driver.Driver({}, second_feature_set)
    return dr.execute([...], inputs={'features_1' : calcs_1, 'features_2' : 'calcs_2'}) 
    

@flow(name="full_pipeline")
def flow():
    features_1 = feature_calcs_1(...)
    features_2 = feature_calcs_2(...)
    final_features = join_and_process(
        features_1, 
        features_2)

So, all we’ve done is take the last one and broken it into components. The value add is: 1. Hamilton to organize micro transformations 2. Prefect to build/handle infrastructure, retries, etc… 3. Broken up into components to rely on prefects caching, …

Elijah Ben Izzy

06/15/2023, 4:40 PM

And, finally, we’re thinking of automating the second approach. E.G. building a tool that, given: • a hamilton pipeline • a “target” (in this case prefect, but could just as easily be airflow/kubeflow) • a configuration for how to partition/compile to the target Would produce the generated worfklow code, so data scientists could think in Hamilton rather than prefect/airflow/whatever. Would love your thoughts as to whether that would be useful 🙂

🙌 1

Open in Slack

Previous Next