Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

hello everyone! does anyone have an example using partition by when saving parquet files using kedro catalog? thank you very much in advance

Any chance this may help? <https://github.com/kedro-org/kedro/issues/1567>

hmm, I can see this issue is still open, I would like to specify the partition columns to save in the save_args.

What’s your existing config looks like? If you just need partition by columns you should be able to use SparkDataSet itself but not PartitionedDataSet


&gt;  save_args: Save args passed to Spark DataFrame write options.
&gt;                 Similar to load_args this is dependent on the selected file
&gt;                 format. You can pass ``mode`` and ``partitionBy`` to specify
&gt;                 your overwrite mode and partitioning respectively. You can find
&gt;                 a list of options for each format in Spark DataFrame
<https://docs.kedro.org/en/stable/kedro.extras.datasets.spark.SparkDataSet.html>