Nok Lam Chan
05/08/2024, 1:21 PMspaceflights
structure
Create a new repo and open it in a VScode workspace:
kedro new -n spaceflights --tools=none --example=yes
VS Marketplace Link: https://marketplace.visualstudio.com/items?itemName=kedro.KedroAnkita Katiyar
05/10/2024, 3:56 PMapache-airflow
to orchestrate your Kedro projects before.
• Please react with K below if you have used kedro-airflow
to create DAGs for deployment on Apache Airflow.
Faced any hurdles? 🚧 Drop a comment in this thread to share your pain points 🔴 you’ve encountered in using / trying to use the kedro-airflow
plugin.Juan Luis
06/05/2024, 9:53 AMNok Lam Chan
06/06/2024, 3:42 PMTab
to get the dataset name without typing out the full thing. What do you think?Juan Luis
06/18/2024, 10:30 AMYury Fedotov
06/27/2024, 6:25 AMint
layer as a typed/concatenated mirror of raw
, then pri
and feat
etc.
And while my raw
datasets definitions are quite long and differ from dataset to dataset, e.g. like this:
raw_notifications_multisheet:
type: pandas.ExcelDataset
filepath: data/01_raw/...xlsx
load_args:
sheet_name: null
dtype:
Order: str
Equipment: str
<<: *raw_layer
It takes me just 3 dataset definitions to capture an arbitrary number of int
, pri
and feat
layer datasets, all of which I just want to save as a parquet file.
"int_{dataset}":
type: pandas.ParquetDataset
filepath: data/02_intermediate/int_{dataset}.parquet
<<: *intermediate_layer
"pri_{dataset}":
type: pandas.ParquetDataset
filepath: data/03_primary/pri_{dataset}.parquet
<<: *primary_layer
"feat_{dataset}":
type: pandas.ParquetDataset
filepath: data/04_feature/feat_{dataset}.parquet
<<: *feature_layer
If not dataset factories, the catalog YAML would have been incredibly long, or at best I would have to use a jinja for loop, which requires knowing all datasets in advance of the run.Juan Luis
07/02/2024, 5:34 PMElijah Ko
07/09/2024, 5:35 PMNok Lam Chan
07/16/2024, 11:22 AMNok Lam Chan
07/22/2024, 1:14 PMJuan Luis
07/31/2024, 7:29 AMDeepyaman Datta
08/14/2024, 2:22 PMAlexey Gravanov
08/23/2024, 12:51 PMDmitry Sorokin
09/17/2024, 1:35 PMDeepyaman Datta
09/24/2024, 5:52 PMRashida Kanchwala
10/16/2024, 8:02 AMMerel
11/04/2024, 5:32 PMJuan Luis
01/23/2025, 5:42 PMRavi Kumar Pilla
01/27/2025, 6:35 PMJuan Luis
01/31/2025, 8:41 AMAnu Arora
02/11/2025, 11:43 AMs3://
support to spark dataset along with s3a://
on a user request. I know s3a is recommended for spark but would love to take your viewpoint on would you really want that and for what use case? is it for EMRFS?Ariana Leiva
02/27/2025, 3:01 PMAriana Leiva
02/27/2025, 3:41 PMJitendra Gundaniya
03/04/2025, 12:13 PMEan
03/04/2025, 9:22 PMRashida Kanchwala
03/05/2025, 8:33 AMAnkita Katiyar
03/10/2025, 10:21 AMlatest
version of the docs here and here.
While kedro-datasets
offers various connectors to interact with Delta tables, it doesn’t have support for Iceberg tables currently. We’d like to hear from the Kedro community about what they’d like to see! If you’ve worked with Iceberg tables and Kedro before or would like to in the future, leave your comments under this issue!Elijah Ko
03/12/2025, 10:03 AMElijah Ko
03/27/2025, 4:35 PMStephanie Kaiser
03/31/2025, 4:01 PM