Slackbot
10/25/2022, 8:31 PMStefan Krawczyk
10/25/2022, 8:33 PMStefan Krawczyk
10/25/2022, 9:10 PMdef A(text_column: ...) -> Model:
# fit topic model
...
def B(A: Model) -> pd.Series:
# create series of topic assignments
...
def C(A: Model) -> pd.DataFrame:
# extract metadata from model
return get_data_from(A)
Now when you run this via a driver and you want both the output of C and the output of B — you probably want to switch to a DictResult
builder (because creating a single dataframe doesn’t make sense here, right?)
adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['B', 'C'])
Elijah Ben Izzy
10/25/2022, 9:15 PMJames Marvin
10/25/2022, 9:17 PMElijah Ben Izzy
10/25/2022, 9:21 PMC
and B
), so yep! I think so (one is a series in the code we supplied…)Stefan Krawczyk
10/25/2022, 9:22 PMresult_dict = {
'B': output of B.
'C': output of C
}
James Marvin
10/25/2022, 9:24 PMJames Marvin
10/25/2022, 9:43 PMStefan Krawczyk
10/25/2022, 9:58 PMStefan Krawczyk
10/25/2022, 10:00 PMJames Marvin
10/26/2022, 6:32 AMJames Marvin
10/26/2022, 9:50 AMElijah Ben Izzy
10/26/2022, 4:11 PMElijah Ben Izzy
10/26/2022, 4:12 PMStefan Krawczyk
10/26/2022, 5:49 PMdef A(text_column: ...) -> Model:
# fit topic model
...
def B(A: Model) -> pd.Series:
# create series of topic assignments
...
def C(A: Model) -> pd.DataFrame:
# extract metadata from model
return get_data_from(A)
def df_result(A: pd.Series, col2: pd.Series, col3: pd.Series, ... ) -> pd.DataFrame:
return pd.DataFrame({'A': A, 'col2': col2, 'col3': col3, ...})
Then the driver is like the following:
adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['df_result', 'C'])
If that makes sense. Before the result builder was performing the logic in df_result
for you and you didn’t need to specify it. Here we’ve hardcoded it — and results in only requesting two outputs from the Hamilton driver.
A second way would be to request all the columns like before, and you create the first dataframe after getting the results from Hamilton:
adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['A', 'col2', 'col3', ..., 'C'])
df_2 = result_dict['C']
del result_dict['C'] # remove from dict
df_1 = pd.DataFrame(**result_dict) # build dataframe
The third way is as @Elijah Ben Izzy mentioned, writing a custom result builder to encapsulate this logic for you — if that’s of interest I can provide a gist.Stefan Krawczyk
10/26/2022, 10:44 PM