This message was deleted.
# hamilton-help
s
This message was deleted.
e
Heh I’ve built this exact same pipeline before 🙂 OK, so I think you’re doing this pretty much right. E.G. this is what I get on a simplified example:
Copy code
In [7]: pd.Series([["a"], ["b", "a", "d"], ["c", "d"]]).apply(lambda x: [item for item in x if item != "d"])
Out[7]:
0       [a]
1    [b, a]
2       [c]
dtype: object
What’s the exact error message you’re getting?
s
AttributeError: 'function' object has no attribute 'apply'
e
Ahh
So the problem is here:
Copy code
def remove_empty_strs(split_on_tokens : pd.Series) -> pd.Series:
    return split_on_tokens.apply(lambda x: [elem for elem in x if elem])

def remove_stop_words(vendor : pd.Series, stop_words : list) -> pd.Series:
    return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])
You want to declare
remove_empty_strs
as a parameter to
remove_stop_words
— basically that tells hamilton to get the result of
remove_empty_strs
and inject into
remove_stop_words
Otherwise its utilizing the function pointer
remove_empty_strs
in the global namespace, which is a function, not a series
Like this:
Copy code
def remove_stop_words(remove_empty_strs: pd.Series, vendor : pd.Series, stop_words : list) -> pd.Series:
    return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])
s
Ah.
So if I have it as an argument in the function, Hamilton goes looking for that argument in the namespace and calls that function with all the input arguments?
e
Pretty much, yeah. Basically Hamilton creates a DAG (directed acyclic graph) consisting of all the functions it’s given, where each function is (pretty much) 1:1 with a “artifact” you want. Then, you ask for some set of results. The functions tell us which they depend on, which in turn tell us what they depend on, and so on. So we can create a DAG from it.
s
Then you traverse the DAG and kick the answer back up, that's brilliant
❤️ 1
e
Yep!
I think this might be better at explaining it than I am 🙂 https://www.tryhamilton.dev/
👀 1
It has some nice interactive visualizations that you can play with
s
Now, I've made the changes you've suggested, with this code:
Copy code
import pandas as pd
import re

def clean_strings(strings : pd.Series, split_tokens : list, stop_words : list) -> pd.Series:
    return remove_stop_words

def lower_case(vendor : pd.Series) -> pd.Series:
    return vendor.str.lower()

def split_on_tokens(lower_case : pd.Series, split_tokens : list) -> pd.Series:
    split_tokens='|'.join(split_tokens)
    return lower_case.apply(lambda x: re.split(split_tokens, x))

def remove_empty_strs(split_on_tokens : pd.Series) -> pd.Series:
    return split_on_tokens.apply(lambda x: [elem for elem in x if elem])

def remove_stop_words(remove_empty_strs : pd.Series, stop_words : list) -> pd.Series:
    return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])
and it returns a function pointer to
remove_stop_words
instead of a dataframe
s
Copy code
def clean_strings(strings : pd.Series, split_tokens : list, stop_words : list) -> pd.Series:
    return remove_stop_words
I think that’s the issue here ^
s
executed running this:
Copy code
input_data = {'strings' : df['STRINGS'],
             'stop_words' : stop_words,
             'split_tokens' : split_tokens}
vendor = dr.execute(['clean_strings'], inputs=input_data)
e
Yep! What @Stefan Krawczyk said.
s
@Stephen Webb want to jump on a quick call?
might be simpler to talk it through
s
I'm wrestling with a cold and pretty much a mess right now, but I think I'm getting it.
Yeah, I've got it running all the way through and returning what I expect
So the inputs are nodes in the DAG that depend on nothing themselves, and then the DAG just resolves itself bottom up.
e
Yep! That’s exactly it.
s
The function signatures point to the nodes, so no need to ever specify that
Got it! Okay that helps a lot! Thanks so much @Elijah Ben Izzy and @Stefan Krawczyk
🫡 1
s
you’re welcome!