Artem
04/20/2024, 5:12 PM# functions.py - declare and link your transformations as functions....
import pandas as pd
def a(input: pd.Series) -> pd.Series:
return input % 7
def b(a: pd.Series) -> pd.Series:
return a * 2
# And run them!
import functions
from hamilton import driver
dr = driver.Driver({}, functions)
result = dr.execute(
['a', 'b'],
inputs={'input': pd.Series([1, 2, 3, 4, 5])}
)
print(result)
I want to define my functions as
def a(input: float) -> float:
return input % 7
def b(a: float) -> float:
return a * 2
and apply them to the same input and get same output as in the example above. My goal is to avoid using pandas in the functions and be able to apply them in real-time production environment (which does not use pandas), apply them to pandas dataframes and spark dataframes in notebooks when developing those functions.
I would very appreciate help and recommendations. This thing is blocking me to make a decision to use Hamilton for our feature development. Thanks.Stefan Krawczyk
04/20/2024, 5:46 PMStefan Krawczyk
04/20/2024, 5:49 PMArtem
04/20/2024, 5:50 PMStefan Krawczyk
04/20/2024, 5:52 PMStefan Krawczyk
04/20/2024, 5:54 PMStefan Krawczyk
04/20/2024, 5:57 PMArtem
04/20/2024, 6:05 PMStefan Krawczyk
04/20/2024, 6:07 PMArtem
04/20/2024, 6:19 PMdef add(a: int, b: int) -> int:
return a + b
df = pd.DataFrame({'a': [1,4], 'b': [3,2]})
Input dataframe:
a b
0 1 3
1 4 2
Output data frame:
a b add
0 1 3 4
1 4 2 6
Stefan Krawczyk
04/20/2024, 10:35 PMimport pandas as pd
from typing import Union
INT = Union[pd.Series, int]
FLOAT = Union[pd.Series, float]
def add(a: INT, b: INT) -> INT:
return a + b
def bar(add: INT, c: INT) -> INT:
return add * 2 + c * 3
then run.py
# And run them!
import functions
from hamilton import driver, base
import pandas as pd
dr = (
driver.Builder()
.with_config({})
.with_modules(functions)
.with_adapters(base.PandasDataFrameResult())
.build()
)
df = pd.DataFrame({'a': pd.Series([1,2,3,4]),
'b': pd.Series([3,5,6,6]),
'c': pd.Series([1,1,1,1])
})
result = dr.execute(
['a', 'b', 'add', 'bar'],
inputs=df.to_dict(orient='series')
)
print(result)
dr.display_all_functions(
"graph.dot", orient="TB", show_legend=False)
In the online side you’d just change the adapter to be a dictionary result — and just pass in primitive values.Stefan Krawczyk
04/20/2024, 10:38 PMArtem
04/21/2024, 1:41 AMStefan Krawczyk
04/21/2024, 1:44 AMArtem
04/22/2024, 12:20 PMStefan Krawczyk
04/22/2024, 4:35 PMThe main production pipeline serves as the orchestrator with no dependency on Hamilton.Not having the same implementation can be a source of training-serving skew. But it sounds like you’d be reusing the functions in production, just not using Hamilton to run it? Is that correct?
Artem
04/22/2024, 4:50 PM