Slackbot
08/28/2023, 12:50 AMStefan Krawczyk
08/28/2023, 1:03 AM# dataflow.py
def source_a(param1: str, param2: str) -> pd.Dataframe:
# ... code to pull from source
return df
def filtered_dataframe(source_a: pd.Dataframe) -> pd.Dataframe:
"""this depends on the output of source_a"""
df = source_a... # logic to filter on source_a
return df
and then in the driver:
import dataflow
from hamilton import driver
invariant_parameters = {"param1": ...}
dr = driver.Driver(invariant_paramters, dataflow)
df = dr.execute(["filtered_dataframe"], inputs={"param2": ...})
It will know to run source_a
before filtered_dataframe
because that’s how the flow is defined. This is where you’d request the “outputs” you want, and Hamilton will determine the path to take, to get that output for you.
Does that help?
QQ: Are the two SQL servers interchangeable? or do you join the data from them?Jarrod Hamilton
08/28/2023, 3:52 AMStefan Krawczyk
08/28/2023, 4:22 AMdef source_a(param1: str, param2: str) -> pd.Dataframe:
# ... code to pull from source
return df
def source_b(param1: str, param2: str) -> pd.Dataframe:
# ... code to pull from source
return df
def joined_data(source_a: pd.Dataframe, source_b: pd.Dataframe) -> pd.Dataframe:
# join dfs
return df
def filtered_dataframe(joined_data: pd.Dataframe) -> pd.Dataframe:
"""this depends on the output of joined_data"""
df = joined_data... # logic to filter...
return df
etc