Slackbot
12/18/2023, 7:58 PMStefan Krawczyk
12/18/2023, 8:00 PMStefan Krawczyk
12/18/2023, 8:02 PMStefan Krawczyk
12/18/2023, 8:08 PMGarrett Mooney
12/18/2023, 8:13 PM{prep, train, predict, post-processing}
all in pyspark
for “historical reasons”. The code is in need of a major refactor though so I wanted to piecewise move things to polars
as possible. I have a PoC pipeline that uses spark -> arrow -> polars
to extract the prep and do the rest in polars
so I wanted to use that as a testing ground for hamilton.Stefan Krawczyk
12/18/2023, 8:28 PMStefan Krawczyk
12/18/2023, 8:43 PMdef data_set_foo_pyspark(...) -> ps.DataFrame:
return df # pyspark dataframe object
def data_set_foo_polars(data_set_foo_pyspark: ps.DataFrame) -> pl.DataFrame:
# this will be a blocking call and force spark to compute things
# then it'll bring it into memory and you can do arrow and then to polars ...
return df
Garrett Mooney
12/18/2023, 9:22 PMspark -> arrow -> polars
for me in the above functions or if i needed to define that conversion myselfStefan Krawczyk
12/18/2023, 9:27 PMGarrett Mooney
12/18/2023, 9:29 PMStefan Krawczyk
12/18/2023, 9:33 PM