This message was deleted.
# hamilton-help
s
This message was deleted.
đź‘€ 1
s
thanks for the question. Will get to you after my meeting too!
In your case (if I understand correctly that is), I think you can create node C that depends on B (the model) and creates (queries) the dataframe from the model. Something like this:
Copy code
def A(text_column: ...) -> Model:
   # fit topic model
   ...

def B(A: Model) -> pd.Series:
   # create series of topic assignments
   ... 

def C(A: Model) -> pd.DataFrame:
   # extract metadata from model
   return get_data_from(A)
Now when you run this via a driver and you want both the output of C and the output of B — you probably want to switch to a
DictResult
builder (because creating a single dataframe doesn’t make sense here, right?)
Copy code
adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['B', 'C'])
e
To clarify — fn outputs can be anything! We just like series/dataframes quite a bit, but its entirely up to you.
j
Thanks folks. In the provided example, would my result dict contain my two data frames?
e
So the result dict only contains results for the final vars you queried for (
C
and
B
), so yep! I think so (one is a series in the code we supplied…)
s
the result_dict object would look something like:
Copy code
result_dict = {
   'B': output of B.
   'C': output of C 
}
j
Fantastic! I'll give this a whirl tomorrow and let you know how I get on. Thank you for the help!
🙌 2
One quick follow-up on this... Using the original example, the text column is actually the result of other "upstream" nodes in the graph - so ideally the graph would be able to return two data frames - one such contains the output of the DAG up to + including the generation of the topics, and another summary data frame based on extracting the metadata from the node containing the fitted model... Can I still do that?
s
in short yes. How depends on whether you had a function to create that dataframe? or if you were using the Hamilton driver to return you that dataframe via the ResultBuilder…
happy to jump on a call and walk through some code.
j
Thanks! In this case I was using the Hamilton driver to build/return dataframe #1, and could then write a function in the DAG which returns dataframe #2 (I guess)?
Sorry - just looked at the result builder docs. Assume the best thing for me to do is to use the dictionary result builder and use the output dict to create the dataframes I want... Let me know if there's a better way!
e
Yep! I think that's the best -- you can have two (unjoined) results.
You can always build custom results builders or do a bunch of other thigns, but for now I think that's the best way to unblock you.
s
@James Marvin to put it into code you could do:
Copy code
def A(text_column: ...) -> Model:
   # fit topic model
   ...

def B(A: Model) -> pd.Series:
   # create series of topic assignments
   ... 

def C(A: Model) -> pd.DataFrame:
   # extract metadata from model
   return get_data_from(A)

def df_result(A: pd.Series, col2: pd.Series, col3: pd.Series, ... ) -> pd.DataFrame:
  return pd.DataFrame({'A': A, 'col2': col2, 'col3': col3, ...})
Then the driver is like the following:
Copy code
adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['df_result', 'C'])
If that makes sense. Before the result builder was performing the logic in
df_result
for you and you didn’t need to specify it. Here we’ve hardcoded it — and results in only requesting two outputs from the Hamilton driver. A second way would be to request all the columns like before, and you create the first dataframe after getting the results from Hamilton:
Copy code
adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['A', 'col2', 'col3', ...,  'C'])
df_2 = result_dict['C']
del result_dict['C'] # remove from dict
df_1 = pd.DataFrame(**result_dict) # build dataframe
The third way is as @Elijah Ben Izzy mentioned, writing a custom result builder to encapsulate this logic for you — if that’s of interest I can provide a gist.
In case it’s helpful here’s a gist https://gist.github.com/skrawcz/18c5ce2347f7dce274d83043bc33f982