This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

01/10/2023, 4:21 PM

This message was deleted.

👀 1

Stefan Krawczyk

01/10/2023, 5:03 PM

are you explicitly setting a name on the series?

Stefan Krawczyk

01/10/2023, 5:04 PM

or?

James Marvin

01/10/2023, 5:06 PM

I'm not

Stefan Krawczyk

01/10/2023, 5:07 PM

can you share the code snippet? and how you’re getting the series back?

Stefan Krawczyk

01/10/2023, 5:08 PM

since you shouldn’t need to explicitly set the name

James Marvin

01/10/2023, 6:08 PM

Looks something like this:

Copy code

def remove_profanity(
    strip_whitespace: pd.Series, profanity_list_path: pathlib.Path
) -> pd.Series:
    profanity_list = _read_file_as_list(profanity_list_path)
    profanity.load_censor_words(profanity_list)
    return strip_whitespace.apply(profanity.censor)


def dlp_remove_pii(
    remove_profanity: pd.Series, google_dlp_service: GoogleDlpService
) -> pd.Series:
    return google_dlp_service.deidentify_series(remove_profanity)


def response_value(dlp_remove_pii: pd.Series) -> pd.Series:
    # TODO For some reason the rename is require as the name isn't being set - why?
    print(dlp_remove_pii.name)
    return _apply_regex_substitutes(
        dlp_remove_pii, pii_regex
    )  # .rename("response_value")

The GoogleDlpService is responsible for turning the series into a request to a Google API and turning the response back into a series.

James Marvin

01/10/2023, 6:08 PM

When I

print(dlp_remove_pii.name)

I get

None

Stefan Krawczyk

01/10/2023, 6:09 PM

yep that seems to make sense. So while the DAG is executing, the Series objects that are passed don’t have to have a name attached.

Stefan Krawczyk

01/10/2023, 6:10 PM

what’s the driver code?

Stefan Krawczyk

01/10/2023, 6:10 PM

and where is this causing problems for you?

James Marvin

01/10/2023, 6:13 PM

Good question

James Marvin

01/10/2023, 6:13 PM

Copy code

config = {
        "google_dlp_service": dlp_service,
        "profanity_list_path": profanity_list_path,
        "sentiment_service": sentiment_service,
    }

    dr = driver.Driver(input_df, transforms)
    output_columns = [field for field in FeedbackContainer.__fields__]
    output_data = dr.execute(inputs=config, final_vars=output_columns)

That's the code to execute the dag

James Marvin

01/10/2023, 6:15 PM

The problem it causes is that my final transform looks like this:

Copy code

def _nest_series(**series: pd.Series) -> pd.Series:
    df = pd.concat(my_series, axis=1)
    return df.apply(pd.Series.to_dict, axis=1)

@does(_nest_series)
def feedback(
    prompt_value: pd.Series,
    prompt_type: pd.Series,
    response_type: pd.Series,
    response_value: pd.Series,
    sentiment: pd.Series,
) -> pd.Series:
    pass

James Marvin

01/10/2023, 6:15 PM

The purpose is to turn a given set of series into a single series containing a dictionary containing name:value pairs representing the input series

James Marvin

01/10/2023, 6:16 PM

And the issue is that if the name portion of the name:value pair is missing, I can't export to BigQuery

James Marvin

01/10/2023, 6:19 PM

To give an example of the effect I'm after:

Copy code

Col A     Col B     Col C           Target output
"A"       "B"       "C"             {"Col A":"A", "Col B":"B", "Col C":"C"}

👍 1

Stefan Krawczyk

01/10/2023, 6:21 PM

We should be able to use the

kwarg

keys and set the names then

Stefan Krawczyk

01/10/2023, 6:21 PM

let me write some code

James Marvin

01/10/2023, 6:21 PM

Sure thanks mate

Stefan Krawczyk

01/10/2023, 6:27 PM

🤔 this should do as expected (you had a minor typo in the code above)

Copy code

def _nest_series(**series: pd.Series) -> pd.Series:
    df = pd.concat(series, axis=1)
    return df.apply(pd.Series.to_dict, axis=1)

because this is what it should be getting in:

Copy code

a = pd.Series([1,2,3])
b = pd.Series([4,5,6])
# first line creates a dataframe
pd.concat({'a': a, 'b': b}, axis=1)
   a  b
0  1  4
1  2  5
2  3  6
# next line creates a series of dicts, where the dict keys relate to the series/column names
pd.concat({'a': a, 'b': b}, axis=1).apply(pd.Series.to_dict, axis=1)
0    {'a': 1, 'b': 4}
1    {'a': 2, 'b': 5}
2    {'a': 3, 'b': 6}
dtype: object

James Marvin

01/10/2023, 6:44 PM

Spot on mate that's fixed it. Thanks!!

👌 1

2 Views

Open in Slack

Previous Next