This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

10/25/2022, 8:31 PM

This message was deleted.

👀 1

Stefan Krawczyk

10/25/2022, 8:33 PM

thanks for the question. Will get to you after my meeting too!

Stefan Krawczyk

10/25/2022, 9:10 PM

In your case (if I understand correctly that is), I think you can create node C that depends on B (the model) and creates (queries) the dataframe from the model. Something like this:

Copy code

def A(text_column: ...) -> Model:
   # fit topic model
   ...

def B(A: Model) -> pd.Series:
   # create series of topic assignments
   ... 

def C(A: Model) -> pd.DataFrame:
   # extract metadata from model
   return get_data_from(A)

Now when you run this via a driver and you want both the output of C and the output of B — you probably want to switch to a

DictResult

builder (because creating a single dataframe doesn’t make sense here, right?)

Copy code

adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['B', 'C'])

Elijah Ben Izzy

10/25/2022, 9:15 PM

To clarify — fn outputs can be anything! We just like series/dataframes quite a bit, but its entirely up to you.

James Marvin

10/25/2022, 9:17 PM

Thanks folks. In the provided example, would my result dict contain my two data frames?

Elijah Ben Izzy

10/25/2022, 9:21 PM

So the result dict only contains results for the final vars you queried for (

and

), so yep! I think so (one is a series in the code we supplied…)

Stefan Krawczyk

10/25/2022, 9:22 PM

the result_dict object would look something like:

Copy code

result_dict = {
   'B': output of B.
   'C': output of C 
}

James Marvin

10/25/2022, 9:24 PM

Fantastic! I'll give this a whirl tomorrow and let you know how I get on. Thank you for the help!

🙌 2

James Marvin

10/25/2022, 9:43 PM

One quick follow-up on this... Using the original example, the text column is actually the result of other "upstream" nodes in the graph - so ideally the graph would be able to return two data frames - one such contains the output of the DAG up to + including the generation of the topics, and another summary data frame based on extracting the metadata from the node containing the fitted model... Can I still do that?

Stefan Krawczyk

10/25/2022, 9:58 PM

in short yes. How depends on whether you had a function to create that dataframe? or if you were using the Hamilton driver to return you that dataframe via the ResultBuilder…

Stefan Krawczyk

10/25/2022, 10:00 PM

happy to jump on a call and walk through some code.

James Marvin

10/26/2022, 6:32 AM

Thanks! In this case I was using the Hamilton driver to build/return dataframe #1, and could then write a function in the DAG which returns dataframe #2 (I guess)?

James Marvin

10/26/2022, 9:50 AM

Sorry - just looked at the result builder docs. Assume the best thing for me to do is to use the dictionary result builder and use the output dict to create the dataframes I want... Let me know if there's a better way!

Elijah Ben Izzy

10/26/2022, 4:11 PM

Yep! I think that's the best -- you can have two (unjoined) results.

Elijah Ben Izzy

10/26/2022, 4:12 PM

You can always build custom results builders or do a bunch of other thigns, but for now I think that's the best way to unblock you.

Stefan Krawczyk

10/26/2022, 5:49 PM

@James Marvin to put it into code you could do:

Copy code

def A(text_column: ...) -> Model:
   # fit topic model
   ...

def B(A: Model) -> pd.Series:
   # create series of topic assignments
   ... 

def C(A: Model) -> pd.DataFrame:
   # extract metadata from model
   return get_data_from(A)

def df_result(A: pd.Series, col2: pd.Series, col3: pd.Series, ... ) -> pd.DataFrame:
  return pd.DataFrame({'A': A, 'col2': col2, 'col3': col3, ...})

Then the driver is like the following:

Copy code

adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['df_result', 'C'])

If that makes sense. Before the result builder was performing the logic in

df_result

for you and you didn’t need to specify it. Here we’ve hardcoded it — and results in only requesting two outputs from the Hamilton driver. A second way would be to request all the columns like before, and you create the first dataframe after getting the results from Hamilton:

Copy code

adapter = base.SimplePythonGraphAdapter(base.DictResult())
dr = driver.Driver(dag_config, modules, adapter=adapter)
result_dict = dr.execute(['A', 'col2', 'col3', ...,  'C'])
df_2 = result_dict['C']
del result_dict['C'] # remove from dict
df_1 = pd.DataFrame(**result_dict) # build dataframe

The third way is as @Elijah Ben Izzy mentioned, writing a custom result builder to encapsulate this logic for you — if that’s of interest I can provide a gist.

Stefan Krawczyk

10/26/2022, 10:44 PM

In case it’s helpful here’s a gist https://gist.github.com/skrawcz/18c5ce2347f7dce274d83043bc33f982

Open in Slack

Previous Next