https://github.com/stitchfix/hamilton logo
#hamilton-help
Title
# hamilton-help
b

Ben

09/02/2022, 5:19 PM
Hi all, is it possible to combine
@parameterize[-sources]
with
@extract_columns
? (i.e. to get multiple columns back from a parameterized function) I can't wrap my head around how it would work, if it is.
s

Stefan Krawczyk

09/02/2022, 5:22 PM
So you want to create multiple functions that output dataframes, that you would then want to run extract_columns on to expose those columns?
b

Ben

09/02/2022, 5:25 PM
Yes, I'm already passing in a list of dicts to
@parameterize
(as an expanded dict comprehension), ideally I could just specify multiple output columns there.
s

Stefan Krawczyk

09/02/2022, 5:30 PM
🤔 hmm. Would have to think about this one. Challenge would be to ensure things are still evident as to what is going on and thus readable. That said,
extract_columns
is just syntactic sugar for:
Copy code
def column_a(my_df: pd.Dataframe) -> pd.Series:
   return my_df['column_a']

def column_b(my_df: pd.Dataframe) -> pd.Series:
    return my_df['column_b']
Which I think (would need to write some code to prove this to myself) you could write as a separate parametrize function itself, rather than sticking it all into one parameterize function.
e

Elijah Ben Izzy

09/02/2022, 5:36 PM
Building on what @Stefan Krawczyk said, going from memory — currently doing it that exact way is going to be tricky — mechanically it should work but the names will conflict with each other. E.G. we’ll extract the same columns on all parameterizations. Does each parameteization product the same set of volume? Or different ones? There are a few approaches I can think of: 1. Split into two — have a function for each parameterization that's an identity with extract_columns 2. Fold it into a single parametrization where the function only returns the colum (as Stefan is suggesting). Not at my computer now but I'll be mulling this over.
Also highly recommend trying the new
@parameterize
decorator — allows for both values and inputs :)
b

Ben

09/02/2022, 6:16 PM
Each parameterization produces a different set of three columns. (if that's what you're asking?) I'll try stuff out and see what I can come up with.
(I'm already using
@parameterize
, it's great -- although the docs are a little confusing still, they talk about
source()
and
value()
initially and then about
upstream()
and
literal()
, are they the same things? Or is upstream/literal an old way of writing it?)
e

Elijah Ben Izzy

09/02/2022, 6:27 PM
Yep, what I was asking. And ugh, need to fix it! We settled on source and value, upstream and literal are older. We switched halfway through making it. Will fix the docs, thanks!
s

Stefan Krawczyk

09/02/2022, 6:27 PM
(I’m already using
@parameterize
, it’s great -- although the docs are a little confusing still, they talk about
source()
and
value()
initially and then about
upstream()
and
literal()
, are they the same things? Or is upstream/literal an old way of writing it?
[edit] what @Elijah Ben Izzy said [/edit]. If you have time please feel free to create an issue for this — else I’ll try to get to this in the afternoon, if not early next week.
or @Elijah Ben Izzy do you have this one?
e

Elijah Ben Izzy

09/02/2022, 6:29 PM
Yep I can handle it soon 👍
OK, tries to make it a little clearer. Still use `literal`/`upstream` internally and in some places to describe it, but the APIs in documentation are made consistent in this PR: https://github.com/stitchfix/hamilton/pull/192
s

Stefan Krawczyk

09/02/2022, 10:49 PM
@Elijah Ben Izzy do you need to update gitbook too?
e

Elijah Ben Izzy

09/03/2022, 2:58 AM
Think k got it but will check tomorrow
Gitbook had it right
m

Michael Cunningham

11/08/2022, 5:29 PM
For those searching for combined
@parameterize_sources
and
@extract_columns
functionality (like I was): https://github.com/stitchfix/hamilton/issues/196
e

Elijah Ben Izzy

11/08/2022, 7:57 PM
Hey Michael -- I'll be digging into that soon. In the meanwhile, feel free to post your thoughts + use-case! The more general we can make it/more use-cases we think about before building the happier we'll be,
m

Michael Cunningham

11/08/2022, 8:10 PM
Hey Elijah, I think the use cases described in the issue were pretty much what I was thinking: a multi-input and multi-output function that is called multiple times through use of a parameterization The approach I would take now is to use
@parameterize_sources
with a function that outputs a DataFrame and then unique functions to then extract the columns (there is a good example with in the issue that shows this with the my_disaggregator functions with
@extract_columns
).
s

Stefan Krawczyk

11/18/2022, 12:20 AM
in case anyone is wondering, There’s a branch up with functionality for “Using tables/dataframes for parameterization” Issue 196. If you wanted to play around with it — see this comment — we’d love any feedback.
e

Elijah Ben Izzy

11/18/2022, 12:55 AM
Yeah! So I'd love for y'all to take the API for a spin -- its not super polished yet but it would be great to get the community's take on what the API should look like from first principles.