This message was deleted Hamilton Open Source #hamilton-help

Join Slack

This message was deleted.

# hamilton-help

Slackbot

09/02/2022, 5:19 PM

This message was deleted.

Stefan Krawczyk

09/02/2022, 5:22 PM

So you want to create multiple functions that output dataframes, that you would then want to run extract_columns on to expose those columns?

Ben

09/02/2022, 5:25 PM

Yes, I'm already passing in a list of dicts to

@parameterize

(as an expanded dict comprehension), ideally I could just specify multiple output columns there.

Stefan Krawczyk

09/02/2022, 5:30 PM

🤔 hmm. Would have to think about this one. Challenge would be to ensure things are still evident as to what is going on and thus readable. That said,

extract_columns

is just syntactic sugar for:

Copy code

def column_a(my_df: pd.Dataframe) -> pd.Series:
   return my_df['column_a']

def column_b(my_df: pd.Dataframe) -> pd.Series:
    return my_df['column_b']

Which I think (would need to write some code to prove this to myself) you could write as a separate parametrize function itself, rather than sticking it all into one parameterize function.

Elijah Ben Izzy

09/02/2022, 5:36 PM

Building on what @Stefan Krawczyk said, going from memory — currently doing it that exact way is going to be tricky — mechanically it should work but the names will conflict with each other. E.G. we’ll extract the same columns on all parameterizations. Does each parameteization product the same set of volume? Or different ones? There are a few approaches I can think of: 1. Split into two — have a function for each parameterization that's an identity with extract_columns 2. Fold it into a single parametrization where the function only returns the colum (as Stefan is suggesting). Not at my computer now but I'll be mulling this over.

Elijah Ben Izzy

09/02/2022, 5:53 PM

Also highly recommend trying the new

@parameterize

decorator — allows for both values and inputs :)

Ben

09/02/2022, 6:16 PM

Each parameterization produces a different set of three columns. (if that's what you're asking?) I'll try stuff out and see what I can come up with.

Ben

09/02/2022, 6:19 PM

(I'm already using

@parameterize

, it's great -- although the docs are a little confusing still, they talk about

source()

and

value()

initially and then about

upstream()

and

literal()

, are they the same things? Or is upstream/literal an old way of writing it?)

Elijah Ben Izzy

09/02/2022, 6:27 PM

Yep, what I was asking. And ugh, need to fix it! We settled on source and value, upstream and literal are older. We switched halfway through making it. Will fix the docs, thanks!

Stefan Krawczyk

09/02/2022, 6:27 PM

(I’m already using
@parameterize
, it’s great -- although the docs are a little confusing still, they talk about
source()
and
value()
initially and then about
upstream()
and
literal()
, are they the same things? Or is upstream/literal an old way of writing it?

[edit] what @Elijah Ben Izzy said [/edit]. If you have time please feel free to create an issue for this — else I’ll try to get to this in the afternoon, if not early next week.

Stefan Krawczyk

09/02/2022, 6:28 PM

or @Elijah Ben Izzy do you have this one?

Elijah Ben Izzy

09/02/2022, 6:29 PM

Yep I can handle it soon 👍

Elijah Ben Izzy

09/02/2022, 9:53 PM

OK, tries to make it a little clearer. Still use `literal`/`upstream` internally and in some places to describe it, but the APIs in documentation are made consistent in this PR: https://github.com/stitchfix/hamilton/pull/192

Stefan Krawczyk

09/02/2022, 10:49 PM

@Elijah Ben Izzy do you need to update gitbook too?

Elijah Ben Izzy

09/03/2022, 2:58 AM

Think k got it but will check tomorrow

Elijah Ben Izzy

09/03/2022, 1:36 PM

Gitbook had it right

Michael Cunningham

11/08/2022, 5:29 PM

For those searching for combined

@parameterize_sources

and

@extract_columns

functionality (like I was): https://github.com/stitchfix/hamilton/issues/196

Elijah Ben Izzy

11/08/2022, 7:57 PM

Hey Michael -- I'll be digging into that soon. In the meanwhile, feel free to post your thoughts + use-case! The more general we can make it/more use-cases we think about before building the happier we'll be,

Michael Cunningham

11/08/2022, 8:10 PM

Hey Elijah, I think the use cases described in the issue were pretty much what I was thinking: a multi-input and multi-output function that is called multiple times through use of a parameterization The approach I would take now is to use

@parameterize_sources

with a function that outputs a DataFrame and then unique functions to then extract the columns (there is a good example with in the issue that shows this with the my_disaggregator functions with

@extract_columns

Stefan Krawczyk

11/18/2022, 12:20 AM

in case anyone is wondering, There’s a branch up with functionality for “Using tables/dataframes for parameterization” Issue 196. If you wanted to play around with it — see this comment — we’d love any feedback.

Elijah Ben Izzy

11/18/2022, 12:55 AM

Yeah! So I'd love for y'all to take the API for a spin -- its not super polished yet but it would be great to get the community's take on what the API should look like from first principles.

Open in Slack