This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

11/07/2023, 8:44 AM

This message was deleted.

Elijah Ben Izzy

11/07/2023, 2:52 PM

Hey — a few approaches. AFK but I can show you code samples later. 1. You can do so in the functions, but you have to maintain the join index with the series. Then, you may need to write a custom results builder to join if there’s funky logic (e.g. you might want to do a chained left merge). I’d recommend testing this out. 2. You can always set the values you don’t want as nan (or some sentinel value) in functions so the index stays the same and add to the results builder or some post-processing step to remove them. 3. You can write a function that accepts upstream series and joins them in the way you want — effectively (1) but you make it part of the DAG rather than outside.

JVial

11/09/2023, 9:10 PM

Thanks for the approaches :) Codesamples would be really helpful :))

Elijah Ben Izzy

11/10/2023, 12:10 AM

So yeah! Here’s what it lokos like (pseudocode): 1. Results builder — see https://hamilton.dagworks.io/en/latest/reference/api-extensions/custom-result-builders/ — should be easy enough to adapt. 2. nans:

Copy code

res = dr.execute(["ID", "customer_id", "first_name"], ...)
res = res.dropna() # You should figure out the best way to do this -- maybe across just some columns?

3. Joining in functions:

Copy code

def final_result(ID: pd.Series, customer_id: pd.Series, first_name: pd.Series, filter_name: str) -> pd.DataFrame:
    df = pd.concat([ID, customer_id, first_name], axis=0)
    return df[df.first_name == filter_name]

res = dr.execute(["final_results"], ...)

OTOH, it might just work! Worth a try. All it does is a concat (I think), and pandas is smart about indices:

Copy code

>>> a = pd.Series(index=[1,2,3], data=['a','b','c'])
>>> b = pd.Series(index=[2,3,4], data=['e', 'f', 'g'])
>>> pd.concat([a,b], axis=1)
     0    1
1    a  NaN
2    b    e
3    c    f
4  NaN    g

Elijah Ben Izzy

11/10/2023, 12:33 AM

So, if you manage the index carefully, this will just… work:

Copy code

>>> def a() -> pd.Series:
...     return pd.Series(index=[1,2,3], data=['a','b','c'])
...
>>> def b() -> pd.Series:
...     return pd.Series(index=[2,3,4], data=['e', 'f', 'g'])
...
>>> from hamilton.ad_hoc_utils import create_temporary_module
>>> dr.execute(["a","b"])
     a    b
1    a  NaN
2    b    e
3    c    f
4  NaN    g
>>> dr.execute(["a","b"]).dropna()
   a  b
2  b  e
3  c  f

That said, managing indices can be a little tricky, so you may want to consider building your own results builder to handle edge cases and make it more explicit what’s happening.

Open in Slack

Previous Next