Jan Hurst
05/20/2024, 8:00 AM@load_from.excel(path="<gs://bkt/source/asset1.xlsx>)
@save_to.parquet(path="<gs://bkt/raw/asset1.parquet>")
def asset1(df: pd.DataFrame) -> pd.DataFrame:
return df
(later i write a @load_from.parquet
wrapper in a downstream pipeline...)
now i actually have a few dozen asset files... so i was tinkering around with some sorta parameterization and resolve magic but it got me to thinking if im doing something really dumb here 😞
my actual working code is just a copy paste of a function for each asset.... any ideas?Stefan Krawczyk
05/20/2024, 3:42 PMStefan Krawczyk
05/20/2024, 3:42 PMJan Hurst
05/20/2024, 3:58 PM@load_from
and building my own load_from-like node that im parameterizing out .... something i've fallen into before
so i now have
@parameter(**NODE_CONFIG)
def asset(path:str, otherstuff:str) -> pd.DataFrame:
df = pd.read_excel(path)
<do stuff>
return df
but really this is just working around things, i still do have the load/save use case and a desire to make it a bit DRYStefan Krawczyk
05/20/2024, 4:56 PMJan Hurst
05/20/2024, 4:58 PMElijah Ben Izzy
05/20/2024, 5:54 PM@move_asset
that just calls both load/save is nice. Otherwise if you need runtime-parameterization, you can do more like what you did (wire in a config).
def move_asset(fn):
@functools.wraps(fn)
def wrapper(from_: str, to_: str):
return load_from(...)(save_to(...))(fn)
return wrapper