Slackbot
06/15/2023, 10:06 AMElijah Ben Izzy
06/15/2023, 2:38 PMConor Molloy
06/15/2023, 3:32 PMStefan Krawczyk
06/15/2023, 4:25 PMStefan Krawczyk
06/15/2023, 4:25 PMConor Molloy
06/15/2023, 4:30 PMElijah Ben Izzy
06/15/2023, 4:38 PMfrom hamilton import driver
from my_code import my_module_1, my_module_2
@task()
def do_some_data_processing(some_input: ...):
dr = driver.Driver({}, my_module_1, my_module_2)
return dr.execute([...])
And it’s as simple as that. The value add of Hamilton here is that you’re using it for “micro” orchestration. Hamilton is not going to handle execution/retries — rather it’ll organize/run code for you. Prefect isn’t going to tell you fine-grained data lineage, allow you to configure outputs, etc…
Now, let’s say you have a pretty big set of hamilton functions that all work together, and you want to break it up into a set of prefect tasks. A common use-case is some expensive operation (external joins), or model training, where you want to store intermediate results.
This would look the same as the above, except you’d rely on breaking it into modules more:
from hamilton import driver
from my_code import first_feature_set, second_feature_set, process_all_features
@task()
def feature_calcs_1():
dr = driver.Driver({}, first_feature_set)
return dr.execute([...])
@task()
def feature_calcs_2():
dr = driver.Driver({}, second_feature_set)
return dr.execute([...])
@task()
def join_and_process(calcs_1: ..., calcs_2: ...):
dr = driver.Driver({}, second_feature_set)
return dr.execute([...], inputs={'features_1' : calcs_1, 'features_2' : 'calcs_2'})
@flow(name="full_pipeline")
def flow():
features_1 = feature_calcs_1(...)
features_2 = feature_calcs_2(...)
final_features = join_and_process(
features_1,
features_2)
So, all we’ve done is take the last one and broken it into components. The value add is:
1. Hamilton to organize micro transformations
2. Prefect to build/handle infrastructure, retries, etc…
3. Broken up into components to rely on prefects caching, …Elijah Ben Izzy
06/15/2023, 4:40 PM