Seth Stokes
05/01/2024, 10:52 PMn
excel files to process each corresponding to a day's snapshot of data,
I could load them with Parallelizable
and Collect
yielding over filepaths.
But each file has m=3
sheets that i need to load as seperate data sets.
The Parallelizable
works on the n
items but not the m
sheets.
Is there a hamiltonian idiom for that yet?Stefan Krawczyk
05/01/2024, 10:54 PMStefan Krawczyk
05/01/2024, 10:56 PMElijah Ben Izzy
05/01/2024, 10:56 PMThierry Jean
05/01/2024, 10:58 PMdef excel_files(file_paths: list[str]) -> list[Excel]: # don't know the type
return [load_excel(p) for p in file_paths]
def sheet(excel_files: list[Excel]) -> Sheet:
for excel_file in excel_files:
for sheet in excel_file.sheets:
yield sheet
def sheet_collection(transformed_sheet: Collect[...]) -> list[...]
return list(transformed_sheet)
Seth Stokes
05/01/2024, 10:59 PMm
dataflows, one for each, and then have a driver for each one that yields over the filepaths?Thierry Jean
05/01/2024, 10:59 PMdef excel_files(file_paths: list[str]) -> list[Excel]: # don't know the type
return [load_excel(p) for p in file_paths]
def sheet(excel_files: list[Excel]) -> dict:
for file_idx, excel_file in enumerate(excel_files):
for sheet_idx, sheet in enumerate(excel_files.sheets):
yield dict(doc_id=file_idx, sheet_idx=sheet_idx, sheet=sheet)