This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

03/17/2023, 8:41 PM

This message was deleted.

Elijah Ben Izzy

03/17/2023, 8:56 PM

Heh I’ve built this exact same pipeline before 🙂 OK, so I think you’re doing this pretty much right. E.G. this is what I get on a simplified example:

Copy code

In [7]: pd.Series([["a"], ["b", "a", "d"], ["c", "d"]]).apply(lambda x: [item for item in x if item != "d"])
Out[7]:
0       [a]
1    [b, a]
2       [c]
dtype: object

What’s the exact error message you’re getting?

Stephen Webb

03/17/2023, 8:57 PM

AttributeError: 'function' object has no attribute 'apply'

Elijah Ben Izzy

03/17/2023, 8:57 PM

Ahh

Elijah Ben Izzy

03/17/2023, 8:58 PM

So the problem is here:

Copy code

def remove_empty_strs(split_on_tokens : pd.Series) -> pd.Series:
    return split_on_tokens.apply(lambda x: [elem for elem in x if elem])

def remove_stop_words(vendor : pd.Series, stop_words : list) -> pd.Series:
    return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])

You want to declare

remove_empty_strs

as a parameter to

remove_stop_words

— basically that tells hamilton to get the result of

remove_empty_strs

and inject into

remove_stop_words

Elijah Ben Izzy

03/17/2023, 8:59 PM

Otherwise its utilizing the function pointer

remove_empty_strs

in the global namespace, which is a function, not a series

Elijah Ben Izzy

03/17/2023, 8:59 PM

Like this:

Copy code

def remove_stop_words(remove_empty_strs: pd.Series, vendor : pd.Series, stop_words : list) -> pd.Series:
    return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])

Stephen Webb

03/17/2023, 9:01 PM

Ah.

Stephen Webb

03/17/2023, 9:01 PM

So if I have it as an argument in the function, Hamilton goes looking for that argument in the namespace and calls that function with all the input arguments?

Elijah Ben Izzy

03/17/2023, 9:03 PM

Pretty much, yeah. Basically Hamilton creates a DAG (directed acyclic graph) consisting of all the functions it’s given, where each function is (pretty much) 1:1 with a “artifact” you want. Then, you ask for some set of results. The functions tell us which they depend on, which in turn tell us what they depend on, and so on. So we can create a DAG from it.

Stephen Webb

03/17/2023, 9:04 PM

Then you traverse the DAG and kick the answer back up, that's brilliant

❤️ 1

Elijah Ben Izzy

03/17/2023, 9:04 PM

Yep!

Elijah Ben Izzy

03/17/2023, 9:04 PM

I think this might be better at explaining it than I am 🙂 https://www.tryhamilton.dev/

👀 1

Elijah Ben Izzy

03/17/2023, 9:05 PM

It has some nice interactive visualizations that you can play with

Stephen Webb

03/17/2023, 9:05 PM

Now, I've made the changes you've suggested, with this code:

Copy code

import pandas as pd
import re

def clean_strings(strings : pd.Series, split_tokens : list, stop_words : list) -> pd.Series:
    return remove_stop_words

def lower_case(vendor : pd.Series) -> pd.Series:
    return vendor.str.lower()

def split_on_tokens(lower_case : pd.Series, split_tokens : list) -> pd.Series:
    split_tokens='|'.join(split_tokens)
    return lower_case.apply(lambda x: re.split(split_tokens, x))

def remove_empty_strs(split_on_tokens : pd.Series) -> pd.Series:
    return split_on_tokens.apply(lambda x: [elem for elem in x if elem])

def remove_stop_words(remove_empty_strs : pd.Series, stop_words : list) -> pd.Series:
    return remove_empty_strs.apply(lambda x: [elem for elem in x if elem not in stop_words])

and it returns a function pointer to

remove_stop_words

instead of a dataframe

Stefan Krawczyk

03/17/2023, 9:06 PM

Copy code

def clean_strings(strings : pd.Series, split_tokens : list, stop_words : list) -> pd.Series:
    return remove_stop_words

I think that’s the issue here ^

Stephen Webb

03/17/2023, 9:06 PM

executed running this:

Copy code

input_data = {'strings' : df['STRINGS'],
             'stop_words' : stop_words,
             'split_tokens' : split_tokens}
vendor = dr.execute(['clean_strings'], inputs=input_data)

Elijah Ben Izzy

03/17/2023, 9:06 PM

Yep! What @Stefan Krawczyk said.

Stefan Krawczyk

03/17/2023, 9:10 PM

@Stephen Webb want to jump on a quick call?

Stefan Krawczyk

03/17/2023, 9:10 PM

might be simpler to talk it through

Stephen Webb

03/17/2023, 9:11 PM

I'm wrestling with a cold and pretty much a mess right now, but I think I'm getting it.

Stephen Webb

03/17/2023, 9:11 PM

Yeah, I've got it running all the way through and returning what I expect

Stephen Webb

03/17/2023, 9:12 PM

So the inputs are nodes in the DAG that depend on nothing themselves, and then the DAG just resolves itself bottom up.

Elijah Ben Izzy

03/17/2023, 9:12 PM

Yep! That’s exactly it.

Stephen Webb

03/17/2023, 9:13 PM

The function signatures point to the nodes, so no need to ever specify that

Stephen Webb

03/17/2023, 9:13 PM

Got it! Okay that helps a lot! Thanks so much @Elijah Ben Izzy and @Stefan Krawczyk

🫡 1

Stefan Krawczyk

03/17/2023, 9:13 PM

you’re welcome!

2 Views

Open in Slack

Previous Next