Slackbot
08/02/2023, 6:13 PMElijah Ben Izzy
08/02/2023, 6:28 PMStefan Krawczyk
08/02/2023, 7:39 PMStefan Krawczyk
08/02/2023, 7:46 PMstephen bias
08/02/2023, 8:10 PMdf = dask_bigquery.read_gbq(project_id=project, dataset_id=dataset, table_id=table)
dask_df_map = {c:df[c] for c in df}
cluster = LocalCluster(processes=False)
client = Client(cluster)
dga = h_dask.DaskGraphAdapter(client, base.PandasDataFrameResult())
dr = driver.Driver(dask_df_map, features, adapter=dga)
df = dr.execute(OUTPUT_COLUMNS)
*a dict of dask series, as that seemed to be what it wantedStefan Krawczyk
08/02/2023, 8:13 PMstephen bias
08/02/2023, 8:14 PMStefan Krawczyk
08/02/2023, 8:16 PMstephen bias
08/02/2023, 8:43 PMStefan Krawczyk
08/03/2023, 10:14 PMDaskDataFrameResult
. See the new run.py
. Otherwise in a comment in the PR I added a few things to try understand that we can walkthrough on the call.
Note: Dask on a single machine might not help with memory issues if you’re running into them with Pandas.Stefan Krawczyk
08/04/2023, 4:49 PM