Daniel Lee
07/18/2023, 8:42 AMDataCatalog, I would like to pandas.ParquetDataset to partition by the date in the dataset and save into different folders by date in parquet like how we can do it for spark.SparkDataSet. Is there a way we could partition using pandas?Nok Lam Chan
07/18/2023, 10:22 AMParquetDataSet, I advise using whatever Parquet offer because it’s a native implementation and often you get better predicate pushdown for performance.
Regarding to pandas, any Dataset that not offer partitioning can be partitioned with PartitionedDataSet
https://docs.kedro.org/en/stable/kedro.io.PartitionedDataset.html#kedro.io.PartitionedDataset