Slackbot
03/17/2023, 8:41 PMElijah Ben Izzy
03/17/2023, 8:56 PMIn [7]: pd.Series([["a"], ["b", "a", "d"], ["c", "d"]]).apply(lambda x: [item for item in x if item != "d"])
Out[7]:
0 [a]
1 [b, a]
2 [c]
dtype: object
What’s the exact error message you’re getting?Stephen Webb
03/17/2023, 8:57 PMAttributeError: 'function' object has no attribute 'apply'
Elijah Ben Izzy
03/17/2023, 8:57 PMElijah Ben Izzy
03/17/2023, 8:58 PMdef remove_empty_strs(split_on_tokens : pd.Series) -> pd.Series:
return split_on_tokens.apply(lambda x: [elem for elem in x if elem])
def remove_stop_words(vendor : pd.Series, stop_words : list) -> pd.Series:
return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])
You want to declare remove_empty_strs
as a parameter to remove_stop_words
— basically that tells hamilton to get the result of remove_empty_strs
and inject into remove_stop_words
Elijah Ben Izzy
03/17/2023, 8:59 PMremove_empty_strs
in the global namespace, which is a function, not a seriesElijah Ben Izzy
03/17/2023, 8:59 PMdef remove_stop_words(remove_empty_strs: pd.Series, vendor : pd.Series, stop_words : list) -> pd.Series:
return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])
Stephen Webb
03/17/2023, 9:01 PMStephen Webb
03/17/2023, 9:01 PMElijah Ben Izzy
03/17/2023, 9:03 PMStephen Webb
03/17/2023, 9:04 PMElijah Ben Izzy
03/17/2023, 9:04 PMElijah Ben Izzy
03/17/2023, 9:04 PMElijah Ben Izzy
03/17/2023, 9:05 PMStephen Webb
03/17/2023, 9:05 PMimport pandas as pd
import re
def clean_strings(strings : pd.Series, split_tokens : list, stop_words : list) -> pd.Series:
return remove_stop_words
def lower_case(vendor : pd.Series) -> pd.Series:
return vendor.str.lower()
def split_on_tokens(lower_case : pd.Series, split_tokens : list) -> pd.Series:
split_tokens='|'.join(split_tokens)
return lower_case.apply(lambda x: re.split(split_tokens, x))
def remove_empty_strs(split_on_tokens : pd.Series) -> pd.Series:
return split_on_tokens.apply(lambda x: [elem for elem in x if elem])
def remove_stop_words(remove_empty_strs : pd.Series, stop_words : list) -> pd.Series:
return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])
and it returns a function pointer to remove_stop_words
instead of a dataframeStefan Krawczyk
03/17/2023, 9:06 PMdef clean_strings(strings : pd.Series, split_tokens : list, stop_words : list) -> pd.Series:
return remove_stop_words
I think that’s the issue here ^Stephen Webb
03/17/2023, 9:06 PMinput_data = {'strings' : df['STRINGS'],
'stop_words' : stop_words,
'split_tokens' : split_tokens}
vendor = dr.execute(['clean_strings'], inputs=input_data)
Elijah Ben Izzy
03/17/2023, 9:06 PMStefan Krawczyk
03/17/2023, 9:10 PMStefan Krawczyk
03/17/2023, 9:10 PMStephen Webb
03/17/2023, 9:11 PMStephen Webb
03/17/2023, 9:11 PMStephen Webb
03/17/2023, 9:12 PMElijah Ben Izzy
03/17/2023, 9:12 PMStephen Webb
03/17/2023, 9:13 PMStephen Webb
03/17/2023, 9:13 PMStefan Krawczyk
03/17/2023, 9:13 PM