Hi all,
Could I get your opinion on data/process architecture?
Many of you work with big data in an s3 bucket or sql table. I do not have such resources so I would like to know what you think about the following.
“Data Lake” - Directory on Shared Drive
• Raw data files
ETL
• read csv via pandas
• hamilton to normalize data
• write to parquet
“Data Base” - parquet with norm hist data
Analysis
• read parquet via dask (anticiparting parquet files to balloon)
• Processing via hamilton/dask workflow
👀 1
l
Lucky
06/27/2024, 6:12 PM
hey there, sorry its been a few months since ive been at the coder camp meeting, i see the venue changed i just wanted to ask is there any place nearby yall would recomend for parking?
Thank you!