This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

10/05/2022, 11:15 PM

This message was deleted.

Elijah Ben Izzy

10/05/2022, 11:17 PM

That’s a great question. Think I know but I want to double-check with a quick test. To clarify, it would be something like this:

Copy code

def df() -> pd.DataFrame:
   return pd.DataFrame.from_records([{'a' : 1}])

def a() -> pd.Series:
   return pd.Series([1])

Elijah Ben Izzy

10/05/2022, 11:17 PM

Then in the driver compute both, right?

👍 1

Stefan Krawczyk

10/05/2022, 11:29 PM

@John Herr @Elijah Ben Izzy will show some code. But in short the driver does not unpack a dataframe passed into it so it should compute 'a' from the function definition. If you want to short circuit computation I think overrides parameter on

.execute()

is the way to go.

Elijah Ben Izzy

10/05/2022, 11:44 PM

OK, cool, an example to clarify. I think I may have misunderstood at first. To demonstrate overrides, this is how you might approach it. Note that I’m passing in an override for

(edited a bit for clarity)

Copy code

import pandas as pd
from hamilton.ad_hoc_utils import create_temporary_module
from hamilton.driver import Driver

@extract_columns('a', 'b')
def a() -> pd.Series:
    return pd.DataFrame.from_records([{'a' : 1, 'b': 2}])

def c(a: pd.Series) -> pd.Series:
    return a*2

# This is smart enough to not run "a" and use the input 
result = Driver({}, create_temporary_module(df, c)).execute(final_vars=['c', 'a'], overrides={'a' : pd.Series([2])})

Elijah Ben Izzy

10/05/2022, 11:46 PM

This is cause Hamilton thinks of inputs, etc… as distinct items. A dataframe is a dataframe unless you tell it to extract columns. Overrides allow you to short-circuit execution, but the names have to match up. E.G.

in this case matches a passed in series.

Elijah Ben Izzy

10/06/2022, 12:03 AM

Note that if you happen to pass a dataframe in that has

as an input it will not use that.

Copy code

# this has no knowledge of the fact that the dataframe has the column `a` in it
result = Driver(
    dict(df=pd.DataFrame.from_records([{'a' : 10, 'b': 20}])), 
    create_temporary_module(df, c)).execute(final_vars=['c'])

You’d have to pass the series in as an override to get it to use that. Hope this helps!

Open in Slack

Previous Next