https://kedro.org/ logo
Join Slack
Powered by
# questions
  • a

    Armand Masseau

    05/21/2025, 2:04 PM
    I have another question. Is it possible to link the catalog and params? Currently I am using a globals file from which params is sourcing itself and the catalog too because one of the names of the files used in the pipeline contains a globals variable. I would like to get rid of the globals file and only use parameters because support runtime parameters.
    r
    y
    +2
    • 5
    • 15
  • a

    Adrien Paul

    05/21/2025, 2:16 PM
    Hello, I have a bug with kedro_azureml.dataset.AzureMLAssetDataset in the kedro-azureml plugin. It seems to be related to AzureMachineLearningFilesytem and this issue https://github.com/Azure/azure-sdk-for-python/issues/37089 Someone succeed to use azuremldataset in version 0.9.0 ?
    r
    d
    • 3
    • 16
  • j

    Jonghyun Yun

    05/22/2025, 2:30 PM
    Hi Team, I have a daily job saving several versioned datasets. A downstream process (not written in Kedro) needs <version> (e.g.
    data/01_raw/company/cars.csv/<version>/cars.csv
    ) so that it could pick up correct datasets to process. Is there a way to know which <version> is being used by Kedro?
    r
    • 2
    • 7
  • r

    Richard Asselin

    05/22/2025, 2:42 PM
    Hi there! Just have a quick question re: running kedro-viz from within a virtual environment. For some reason it seems to always pick the version of
    kedro-viz
    from my main Python and not the one in the virtual env (i.e., I have v11.0.0 in my main Python, but v11.0.1 in my virtual env, and running
    kedro viz
    from within the virtual env is picking the 11.0.0 version). Is it just something I'm doing incorrectly? Is that the expected behaviour? Thanks!
    r
    • 2
    • 8
  • c

    coder xu

    05/28/2025, 12:22 AM
    why parquet file in s3 has some errors?just like
    Copy code
    DatasetError: Failed while loading data from dataset ParquetDataset(filepath=kedro/model_input_table.parquet, load_args={}, protocol=s3, save_args={}).
    Expected checksum PqKP+A== did not match calculated checksum: eqRztQ==
    m
    • 2
    • 1
  • c

    coder xu

    05/28/2025, 12:23 AM
    this is my catalog
    Copy code
    model_input_table:
      type: pandas.ParquetDataset
      filepath: <s3://kedro/model_input_table.parquet>
    #  type: pandas.CSVDataset
    #  filepath: <s3://kedro/model_input_table.csv>
    and csv files is fine.
  • j

    Jamal Sealiti

    05/28/2025, 11:32 AM
    Hi, how can i setup kedro with Grafana for tracking node/pipeline data progress?
    d
    j
    • 3
    • 3
  • j

    Jamal Sealiti

    05/30/2025, 10:24 AM
    How Kedro handling merging 2 streaming datasets on some merge keys? And deleting?
    m
    • 2
    • 1
  • j

    Jamal Sealiti

    05/30/2025, 12:19 PM
    Its possible to create custom delta table dataset with change data capture option? and how i can create a table form my custom schame before writstreaming?
    m
    • 2
    • 4
  • y

    Yury Fedotov

    05/30/2025, 2:27 PM
    Are small contributions to docs (like typo fixes) being accepted now? Asking as I see you’re migrating to mkdocs, so maybe not the best time in terms of avoiding merge conflicts
    d
    • 2
    • 3
  • t

    Trọng Đạt Bùi

    06/02/2025, 10:06 AM
    Hello Everyone! Has anyone tried to manually create pipeline(Not auto-register pipeline of kedro)?
    a
    • 2
    • 7
  • a

    Ankit K

    06/02/2025, 3:19 PM
    Hi all, I’m working on a Kedro pipeline (using the
    kedro-vertexai
    plugin, version
    0.10.0
    ) where I need to track each pipeline run in a BigQuery table. We use a table_suffix (typically a date or unique run/session ID) to uniquely identify data and outputs for each pipeline run, ensuring that results from different runs do not overwrite each other and can be traced back to a specific execution. The challenge is that the kedro
    session_id
    or
    KEDRO_CONFIG_RUN_ID
    is not available at config load time, so early config logic (like setting a table_suffix) uses a date or placeholder value. This can cause inconsistencies, especially if nodes run on different days or the pipeline is resumed. (Currently pipeline takes ~2.5 days to run) We tried generating the table_suffix using the current date at config load time, but this led to issues: if a node runs on a different day or the pipeline is resumed, a new table_suffix is generated, causing inconsistencies and making it hard to track a single pipeline run. We also experimented with different Kedro hooks (such as before_pipeline_run and before_node_run) to set or propagate the run/session ID, but still faced challenges ensuring the value is available everywhere, including during config loading. What is the best practice in Kedro (with Vertex AI integration) for generating and propagating a unique run/session ID that is available everywhere (including config loading and all nodes), so that all tracking and table suffixes are consistent for a given run? Should this be set as an environment variable before Kedro starts, or is there a recommended hook or config loader pattern for this? Any advice or examples would be appreciated!
    👀 1
    a
    d
    • 3
    • 2
  • a

    Arnout Verboven

    06/03/2025, 11:00 AM
    Hi! If I have 2 configuration environments (
    local
    and
    prod
    ), is it possible to know during pipeline creation which environment is run? Or how should I do this using proper Kedro patterns. Eg. I want to do something like:
    Copy code
    def create_pipeline(env: str = "local") -> Pipeline:
        if env == "prod":
            return create_pipeline_prod()
        else:
            return create_pipeline_local()
    a
    j
    • 3
    • 4
  • a

    Abhishek Bhatia

    06/10/2025, 5:38 AM
    Hey team! Not a kedro question per se. What is the go to tooling for configuration management in data science projects outside of kedro (with OmegaConf)? Is Hydra the most popular choice? I am looking at the following features: 1. Global configs 2. Clear patterns for config type a. Static vs Dynamic b. Global vs Granular c. Constant vs Overridable 3. Param overriding with Globals 4. Param overriding within config file 5. Support for environment variables 6. Storing environment wise configs - DEV / STG / UAT / PROD etc 7. Interpolation with basic text concat 8. (Optional) Python function as resolvers in config (OmegaConf) 9. Config compilation artifact (i.e. I want to see how my config looks after resolving) 10. Invoking python scripts with arbitrary / alternate config paths 11. Invoking python scripts with specific param value Most of the above features are already there in kedro, but I need this functionality outside kedro. Eager to here the community's recommendation here! 🙂
    d
    j
    • 3
    • 4
  • m

    Malek Bouzidi

    06/10/2025, 12:34 PM
    Hi all. I've been trying kedro for the past few weeks. Everything worked well except for kedro-viz. It doesn't display the previews of the datasets. I followed all the instructions in the doc but nothing worked. Can someone help me to know the reason why it doesn't work ?
    d
    r
    y
    • 4
    • 7
  • s

    Sharan Arora

    06/10/2025, 5:53 PM
    hi I'm getting an error when doing kedro run. Would you be able to help? I have java 17 installed and im unsure why the code gets stuck on that last line, I always have to abort
  • s

    Sharan Arora

    06/11/2025, 1:35 AM
    just to follow up I'm receiving a FileNotFoundError: [WinError 2] The system cannot find the file specified error, I've double checked my path in environment variables and can't find an issue
    d
    • 2
    • 4
  • j

    Jonghyun Yun

    06/11/2025, 9:46 PM
    Hi Team, I'm using kedro 0.18.6. It seems to have a bug. When I create and run a part of composite pipeline, it actually run everything in it. For example, running pipe["a"] will trigger running pipe{"b"], pipe["c"] too. I don't think this is expected behavior. I cannot upgrade the kedro version above 0.18.xx. Was there a fix for this issue?
    n
    • 2
    • 4
  • t

    Trọng Đạt Bùi

    06/12/2025, 6:41 AM
    Has anyone tried to customize Spark Dataset to read multiple folders in HDFS?
    a
    n
    • 3
    • 2
  • m

    Mattis

    06/16/2025, 12:55 PM
    I have configured a dynamic pipeline (catalog and nodes) with a hooks-file. Locally it´s running in a docker container without problems, but when push it to AzureML and run it there, even though i can see the whole pipeline (and all dynamically created nodes names) - i receive "pipeline does not contain that .. node". How is this even possible? Does anyone have a clue?
    s
    r
    m
    • 4
    • 9
  • w

    Wejdan Bagais

    06/17/2025, 4:52 PM
    Hi everyone! 👋 I’m currently exploring how to approach unit testing in Kedro, especially when working with large-scale data pipelines. I’d love to hear your thoughts on a few things: • Do you find unit tests valuable in the context of data pipelines? • How do you typically implement them in Kedro? • Given that data quality checks are often a key focus, how do you handle testing when the input datasets are huge? Creating dummy data for every scenario doesn’t always seem practical. Any tips, examples, or lessons learned would be greatly appreciated! Thanks in advance 🙏
    j
    d
    +2
    • 5
    • 5
  • s

    Sharan Arora

    06/18/2025, 7:53 PM
    Hello, had a question The pipeline I'm trying to build includes credentials for a PostgreSQL DB. The idea is to pass off a containerized pipeline and facilitate the necessary data cleaning, transformation and storage required for further analytics. In credentials.yml, I have added the following
    Copy code
    postgresql_connection:
      host: "${oc.env:POSTGRESQL_HOST}"
      username: "${oc.env:POSTGRESQL_USER}"
      password: "${oc.env:POSTGRESQL_PASSWORD}"
      port: "${oc.env:POSTGRESQL_PORT}"
    and each of these information are stored in a .env file in the same
    local
    folder however when I do
    kedro run
    postgresql_connection isn't recognized and we are unable to detect the actual values provided in the .env file that should be passed onto credentials.yml since I want this to be dynamic and based on user input. Any idea how to resolve this? Additionally what is the process to getting kedro to read credentials.yml as well? it seems on kedro run it only cares about the catalog.yml? is it just linking credentials in catalog? i tried but then it reads the dynamic string literally
    s
    m
    • 3
    • 2
  • r

    Rachid Cherqaoui

    06/20/2025, 11:21 AM
    Hi everyone! 👋 I'm trying to load specific CSV files from an SFTP connection in Kedro, and I need to filter the files using a wildcard pattern. For example, I'd like to load only files that match something like:
    Copy code
    /doc_20250620*_delta.csv
    But I noticed that YAML interprets
    *
    as an anchor, and it doesn't seem to behave like a wildcard here. How can I configure a dataset in
    catalog.yml
    to use a wildcard when loading files from an SFTP path (e.g. to only fetch files starting with a certain prefix and ending with
    _delta.csv
    )? Is there native support for this kind of pattern in Kedro's SFTPDataSet or do I need to implement a custom dataset? Any guidance or examples would be super appreciated! 🙏
    s
    j
    • 3
    • 5
  • r

    Rachid Cherqaoui

    06/23/2025, 7:34 AM
    Hi everyone 👋 I'm currently working with Kedro and trying to load a CSV file hosted on an SFTP server using a
    CSVDataset
    . Here's the relevant entry from my `catalog.yml`:
    Copy code
    yaml
    Copy code
    cool_dataset:
      type: pandas.CSVDataSet
      filepath: 
    <sftp://my-sftp-server/outbox/DW_Extracts/my_file.csv>
      load_args: {}
      save_args:
        index: False
    When I run:
    Copy code
    python
    df = catalog.load("cool_dataset")
    I get the following error: It seems like Kedro/Pandas is trying to use ur`llib` to open the SFTP URL, which doesn't support the
    sftp://
    protocol natively. Has anyone successfully used Kedro to load files from SFTP? If so, could you share your config/setup?
    d
    j
    • 3
    • 8
  • a

    Adrien Paul

    06/23/2025, 5:02 PM
    Hello, In vscode kedro plugging, is it possible to run kedro viz with --include-hooks ? Thanks guys 🙏
    👀 1
    r
    • 2
    • 4
  • n

    Nathan W.

    06/25/2025, 7:32 AM
    Hello guys, I couldn't find any way to store API keys in a
    .env
    or
    credentials.yml
    and then use it in my nodes parameters to make API requests. Are there any simple solutions (without putting it in
    parameters.yml
    and then risk to push my key into production...) I missed ? Thanks a lot in advance for your response, Have a nice day!
    👀 1
    r
    • 2
    • 1
  • f

    Fazil Topal

    06/25/2025, 8:24 AM
    hey everyone, I am building a system where i return the key/filepath of final dataset in the kedro pipeline. What's the ideal way of doing this? A method that also works for partitioned datasets where i get a list of filepaths? I have a catalog instance but somehow all methods are protected so im wondering if im missing something obvious here. I was doing catalog._get_dataset(output)._filepath which works only for non partitioned datasets
    n
    r
    • 3
    • 4
  • j

    Jamal Sealiti

    06/26/2025, 10:14 AM
    Hi, placeholders for catalog.yml not working. I have in conf/base/parameters.yml bootstrap_servers: "localhost:9092" and in my catalog.yml trying to use placeholder like this ${bootstrap_servers} . but i get this error InterpolationKeyError: Interpolation key ' bootstrap_servers' not found
    m
    • 2
    • 2
  • r

    Rachid Cherqaoui

    06/27/2025, 2:20 PM
    hello, How I can put a credentials argument as an input in the pipelines function ?
    👀 1
    r
    • 2
    • 7
  • p

    Pradeep Ramayanam

    06/27/2025, 5:34 PM
    Hi All, hope everyone is doing well! I have a weird file structure as attached and would love to hear if anyone has solved it before, I tried to solve it as attached but I am getting below error DatasetError: No partitions found in '/data/01_raw/nces_ccd/*/Staff/DataFile' Any help would be much appreciated, thanks in advance!!
    👀 1
    r
    • 2
    • 15