https://kedro.org/ logo
Join Slack
Powered by
# questions
  • s

    Shohatl

    09/15/2025, 5:07 AM
    Hello everyone, Is there a way to pass a variable to the create_pipeline() function? My use case is initializing several kedrosessions in a single run where each session will run an etl on each table. The erl can change according to the input. For example if I have several inputs for an etl, I want to dynamically initialize a node per input that loads the table. Is there a way to dynamically create a pipeline based on the input? The input is available only at runtime so I can't use the parameters.yaml Also I wanted to know what is the best practice to run multiple pipelines in parallel. I am currently creating a session per pipeline and running it in a separate thread
    👀 1
    e
    • 2
    • 16
  • j

    Júlio Resende

    09/15/2025, 5:00 PM
    Hello everyone! I'm trying to use SparkDataset to read and write to the Azure Datalake File System, using the abfs:// prefix. I noticed that, although the dataset requires credentials to be passed in the init method, these credentials are not used when writing, requiring the Spark section to be configured globally. This seems a bit out of line with the Kedro standard, as it doesn't allow us to have datasets from multiple sources. Shouldn't we be using these credentials directly when writing and reading, without using the global Spark configuration?
    e
    n
    • 3
    • 4
  • r

    Ralf Kowatsch

    09/18/2025, 9:53 AM
    Hi, our team is new to kedro and we would like to use it as a data engineering tool. The concerns we have are • If we work with ibis or snowpark, we don't want to define each table/view on the database. As far as I understand the DataSets are the persistance objects that connect the different transformations in the pipeline. It there a way to get around defining these? • How many nodes could we run in parallel ? Is there an uper limit if the heavy computing is mainly happening on snowflake? • I understand it that the nodes/transformations have to be molded into a pipeline. Is there an option to do that implicily by referencing another node? • Is there a proper way to solve data quality inclunding generic tests, custom tests • Is there an example project that we could benefit from? Thanks for your inputs
    extreme teamwork 1
    d
    • 2
    • 1
  • e

    Emil Marcetta

    09/18/2025, 7:05 PM
    Hi, we are migrating to v1.0 some old v.18 pipelines and I have a question about parameters.yml and runtime_params. (we run into an issue if not receiving a runtime_params:begin_date) We reference runtime_params in catalog, as an example
    Copy code
    ...
    filepath: "${globals:example_bucket}/i/j/k/date=${runtime_params:begin_date}/
    only the base/parameters.yml has a definition for begin_date
    Copy code
    begin_date: "2025-01-01"
    and the CONFIG_LOADER_ARGS (in settings.py) does not have an entry for config_patterns. (stepping with a debugger in OmegaConfigLoader constructor initialization confirms the “parameters” patterns are present). and we invoke the pipeline as no params (kedro run). The error we receive when catalog loads is:
    Copy code
    InterpolationResolutionError: Runtime parameter 'begin_date' not found and no default value provided.
    Thank you!
    👀 1
    r
    n
    a
    • 4
    • 25
  • s

    Swift

    09/18/2025, 11:27 PM
    👋 I am just starting with kedro. I am putting together an example pipeline to make sure I understand all the concepts before I build a larger project with it. One concept I have not able to fully figure out is how to work with APIs. The simple project I am trying to build is: 1. get top N articles on hacker news 2. fetch the items, to get the urls 3. use an api to summarize the url 4. save the summary, url, etc I am able to easily build the HackerNewsTopAPIDataset for getting the top items. However, I am not able to figure out how to get those item ids into the HackerNewsItemsAPIDataset. I am of course able to fill up the node function with all kinds of io and get it to work, which is what I did. However, everything I read says this is the wrong approach and node functions should be purely functional. I have stumbled into the stackoverflow question, https://stackoverflow.com/questions/73430557/dynamic-parameters-on-datasets-in-kedro, which talks about adding kwargs to the _load(). However I do not see how to pass arguments into the load without explicitly pulling datasets into the node function and calling it directly. This brings me back to doing io in node functions. Now I am left scratching my head on how to link datasets that require input to be able to function. Any insights or pointers would be greatly appreciated.
    e
    • 2
    • 12
  • g

    Galen Seilis

    09/18/2025, 11:46 PM
    Can a custom resolver be used to fill in a catalog dataset name? At a glance this would appear to conflict with the dataset factory notation, but I want to make sure.
    e
    a
    • 3
    • 4
  • p

    Paul Haakma

    09/22/2025, 8:09 AM
    Hi all. Does anyone know of a way to configure a dataset to write out a GeoPandas dataframe to a DuckDb database table, and vice-versa? I tried using ibis.TableDataset to write but it complains that the GeoDataFrame object doesn't have an 'as_table' attribute. Would I have to implement a custom DuckDb dataset? It doesn't look too hard but I don't want to reinvent the wheel if there's already a way to do it...
    d
    s
    d
    • 4
    • 12
  • a

    Anton Nikishin

    09/26/2025, 9:46 AM
    Can I use an existing Databricks cluster with kedro-databricks? By default, kedro-databricks tries to create new resources, which my account doesn’t have permission to do. Is there a way to specify an existing cluster ID instead? I tried editing
    conf/dev/databricks.yml
    with the following code:
    Copy code
    default:
      tasks:
      - existing_cluster_id: 0924-121047-3jcdtqh1
    But running
    kedro databricks bundle --overwrite
    raises an error:
    Copy code
    AssertionError: lookup_key task_key not found in updates: [{'existing_cluster_id': '0924-121047-3jcdtqh1'}]
    n
    j
    • 3
    • 13
  • n

    Nik Linnane

    10/01/2025, 6:19 PM
    hi! im trying to deploy a pipeline using the databricks plugin. im able to
    init
    ,
    bundle
    , and
    deploy
    (what looks to be) successfully (i can see the job and files created in the UI), but always get this error when running...
    Copy code
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    File ~/.ipykernel/6689/command--1-4096408574:22
         20 import importlib
         21 module = importlib.import_module("classification_pipeline")
    ---> 22 module.classification_pipeline()
    
    AttributeError: module 'classification_pipeline' has no attribute 'classification_pipeline'
    it looks like theres confusion about the entry point. some additional details below that may/may not be helpful in debugging... • i'm following these instructions • my pipeline has
    dev
    ,
    qa
    , and
    prod
    environments configured within
    conf
    and i'm trying to deploy
    qa
    • ive added an existing_cluster_id • the commands ive ran are ◦
    kedro databricks init
    ◦
    kedro databricks bundle --env qa --params runner=ThreadRunner
    ◦
    kedro databricks deploy --env qa --runtime-params runner=ThreadRunner
    ◦
    kedro databricks run classification_pipeline
    • "classification_pipeline" is used for my package and project names any help is appreciated! @Jens Peder Meldgaard @Nok Lam Chan
    n
    j
    • 3
    • 9
  • s

    Shah

    10/02/2025, 10:49 AM
    Hi everyone, I'm a novice to Kedro, experimenting with my first implementation. Trying to parametrize every function to take the maximum advantage of the platform. While attempting to access parameters defined in the 'parameters_xxx.yml' file, say for example 'data_processing' pipeline, I have two questions. But first, a glimpse into my
    parameters_data_processing.yml
    file:
    Copy code
    column_rename_params: # Suffix to be added to overlapping columns
        skip_cols: ['Date'] # Columns to skip while renaming
        co2: '_co2'
        motion: '_motion'
        presence: '_presence'
    
    data_clean_params:
      V2_motion: {
            condition: '<0',
            new_val: 0
            }
      V2_presence: {
            condition: '<0',
            new_val: 0
            }
    
      infinite_values:
        infinite_val_remove: true
        infinite_val_conditions:
          - column_name: V2_motion
            lower_bound: -1e10
            upper_bound: 1e10
          - column_name: V2_presence
            lower_bound: -1e10
            upper_bound: 1e10
    I am experimenting with different parameter styles: dictionaries of dictionaries, dictionary of lists etc. So the two questions are as following: 1. How do I pass the second or third level dictionary parameters to a node? e.g. how do I pass
    column_rename_params['co2']
    key's value to one node, and
    column_rename_params['motion']
    key's value to another? My attempt of passing inputs to a node as
    inputs=['co2_processed', 'params:column_rename_params:co2', 'params:column_rename_params:skip_cols']
    , has returned
    "not found in the DataCatalog"
    error. Do I need to define these parameters in
    catalog.yml
    ? Since, the parameters are not defined in the catalog.yml, yet I can access the
    "params:column_rename_params"
    dictionary, I guess there must be a way to access the next level as well. As a workaround, I have simplified the dictionary, keeping everything on the base level (not nested dictionaries). 2. Curiousity: Why do we write
    'params:<key>'
    instead of
    'parameters:<key>'
    ? Just curious because I do not remember to have defined any object as 'params'. I was just following the tutorial. Thanks ahead, and also thanks for Kedro and this slack workspace.
    d
    m
    • 3
    • 8
  • s

    Shah

    10/03/2025, 6:22 PM
    Hello, I upgraded to kedro 1.0 accepting the suggestion from Kedro CLI output. A lot of things broke down, which I could put to work thanks to the documentation (0.19 > 1.0), including changing the internal kedro library function from catalog.filter() to catalog.list(). However, now I cannot follow through with other functionalities such as 'namespaces'. Is there a support for namespaces in 1.0? If yes, how?
    r
    r
    • 3
    • 6
  • s

    Sreekar Reddy

    10/04/2025, 9:56 AM
    Hello, I am Srikar and I am mew to Open-Source contribution I have started exploring kedro , and solved some git issues and know the the fixes but I am not able build my development setup perfectly there are so many build issues I am diving into I have checked the make file and the and corresponding contributing.md files but I am not able to build my setup , so I need some help can anyone help me
    l
    • 2
    • 1
  • m

    Mamadou SAMBA

    10/06/2025, 3:45 PM
    Hi team We found an issue with how Kedro handles empty runtime parameters when triggered from Airflow. In our pipeline config, we have something like:
    Copy code
    some_dataset:
      type: spark.SparkDataset
      file_format: delta
      filepath: "gs://<bucket-prefix>${runtime_params:env}-dataset/app_usages"
    Airflow correctly sends an empty string (
    ''
    )
    for the
    env
    parameter, but Kedro interprets it as
    None
    . So the final path becomes:
    Copy code
    gs://<bucket-prefix>None-dataset/
    instead of:
    Copy code
    gs://<bucket-prefix>-dataset/
    Here’s the simplified Airflow call:
    Copy code
    "params": build_kedro_params(
        [
            f"project={get_project_id()}",
            f"env={env_param}",  # env_param = ''
            
        ]
    )
    It looks like Kedro converts empty strings from runtime parameters into
    None
    during parsing. Has anyone else run into this issue with Kedro interpreting empty strings as
    None
    ?
    c
    • 2
    • 2
  • s

    Stas

    10/07/2025, 4:15 PM
    Hello team, I need to run a pipeline daily, and my input datasets will be in different folders every day. Like, d:\data\{run_date}\dataset1.csv Is it possible to use a parameter in the catalog.yml file to substitute {run_date} with the actual value? Or what is the other way to achieve this?
    d
    • 2
    • 1
  • s

    Stas

    10/09/2025, 11:07 AM
    Hi, Is it possible to specify a parameter type that will be passed to a node function. I have a Node, like this Node(func=myfunc, inputs=["params:run_date"]) My func has signature def myfunc(rundate: datetime.date): So, it looks like Kedro passes run_date as str instead of datetime.date.
    a
    • 2
    • 1
  • s

    Shah

    10/09/2025, 4:47 PM
    Hi, Continuing with my experimentation with namespaces and inheriting/extending pipelines, I have a situation. My current workflow is following. I have namespaces implemented for each of the demo (train and evaluate LR model) and extended (train and model RF model), pipelines. (Continued in the replies to this message... )
    r
    a
    • 3
    • 12
  • g

    Gianni Giordano

    10/13/2025, 12:48 PM
    Hello, we have a kedro pipeline with 450+ nodes and, as you can imagine, we're struggling with kedro-viz. It lags, freezes and takes a lot of time for a simple filter. Is there anything we can do to improve kedro-viz performance? Maybe in the settings or in the source code. Thanks
    s
    d
    • 3
    • 4
  • s

    Stas

    10/14/2025, 11:13 AM
    Hi, I've created a custom dataset. Now, I can see it in the graph using kedro viz, but it shows the size = 0. What should I add to my custom dataset so that kedro viz will correctly show the size of the dataset?
    r
    • 2
    • 4
  • f

    Flavien

    10/15/2025, 12:56 PM
    Hi fellows, I am trying to update an old
    kedro
    code from
    0.18.12
    to
    1.0.0
    step by step, starting with version
    0.19.15
    . We had set up a test to be sure that our custom resolvers were working that reads as
    Copy code
    def test_custom_resolvers_in_example(
        project_path: Path,
    ) -> None:
        bootstrap_project(project_path=project_path)
    
        # Default value
        with KedroSession.create(
            project_path=project_path,
            env="example",
        ) as session:
            context = session.load_context()
            catalog = context._get_catalog()
            assert timedelta(days=1) == catalog.load("params:example-duration")
            assert datetime(1970, 1, 1, tzinfo=timezone.utc) == catalog.load(
                "params:example-epoch"
            )
    It turns out this test was passing with version
    0.18.12
    with
    CONFIG_LOADER_CLASS = OmegaConfigLoader
    but it fails in version
    0.19.15
    . It seems that the environment is not taken into account and that the loader parses all the possible environments (therefore finding duplicates).
    Copy code
    E           ValueError: Duplicate keys found in .../conf/local/catalog.yml and .../conf/production/catalog.yml: hourly_forecasts, output_hourly_forecasts
    Duplicate keys
    doesn't seem to bring any message on Slack. Please let me know the mistake I made. Thanks in advance!
    • 1
    • 2
  • s

    Stas

    10/15/2025, 1:17 PM
    Hi, what is the best practice to manage credentials? The docs mention two approaches: using environment variables and local files. But we need to get credentials dynamically, calling an API. Do you have any recommendations for this?
    n
    • 2
    • 1
  • f

    Flavien

    10/15/2025, 3:33 PM
    Hi folks, I don't know if I am making a mistake somewhere but I think there is some issue with the files on PyPi. I have installed 0.19.15 from https://pypi.org/project/kedro/0.19.15/#files. If you download the archive
    kedro-0.19.15.tar.gz
    and check
    KedroSession.create
    from
    kedro.framework.session
    , you will see that the signature has
    extra_params
    and not
    runtime_params
    . The source code on the GitHub repository for the tag 0.19.15 is correct though (same for 0.19.14). Please let me know if you see the same thing. 😅
    a
    r
    d
    • 4
    • 13
  • s

    Stas

    10/16/2025, 1:32 PM
    Hi, I use a hook to load external credentials as per https://docs.kedro.org/en/stable/extend/hooks/common_use_cases/#use-hooks-to-load-external-credentials In the hook I would like to use some parameters, like an external url and some other parameters to get external credentials. What is the best practice to store and get such kind of parameters? Are environment variables the only way to achieve this?
    Copy code
    def after_context_created(self, context):
        creds = get_credentials*(***url***, ***account***)*
    Copy code
    context.config_loader["credentials"] = {          
             **context.config_loader["credentials"],
    **creds
    }
    s
    • 2
    • 1
  • p

    Paul Haakma

    10/17/2025, 4:06 AM
    Does anyone know if there is a tutorial documentation page on creating custom datasets? This link throws a 404 https://docs.kedro.org/en/1.0.0/data/how_to_create_a_custom_dataset.html That link comes from the Github site: https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/CONTRIBUTING.md
    r
    • 2
    • 2
  • a

    Ayushi

    10/17/2025, 11:04 AM
    Hello I am trying to connect two pipelines let's say P1 and P2 they both have namespace n1 and n2 specified where output from P1 is to be consumed as input in P2 , what is the best practice to specify such pipelines so that we can connect these pipelines, do we always need to explicitly mention in inputs of nodes of P2 - inputs = N1.output, or there is any other way we can do this?
    a
    • 2
    • 3
  • s

    Stas

    10/20/2025, 1:58 PM
    Hi, is it possible to pass a node output as a ApiDataset parameter? The use case is the following: 1) a node produces a list of ids 2) ApiDataset uses the list of ids as a parameter to call an API. 3) the next node gets this dataset that comes from ApiDataset
    r
    • 2
    • 1
  • p

    Pascal Brokmeier

    10/20/2025, 2:08 PM
    Adding to hugging face recently? I have a PoC here https://github.com/everycure-org/matrix/tree/feat/kedro-hf-dataset/pipelines/huggingface-dataset-demo Curious to get feedback on that and happy to bring to the kedro-plugins repo ofc. @Merel just fyi (and sorry for sleeping on our google sheets dataset, I nudged laurens to pick that back up)
    m
    • 2
    • 8
  • t

    Tim Deller

    10/21/2025, 10:30 AM
    Hi! Are there best practices regarding exploratory data analysis and data cleaning? Do you start with notebooks and move code to kedro nodes later on? Thanks for your suggestions
    👌🏼 1
    g
    r
    • 3
    • 5
  • s

    Shu-Chun Wu

    10/24/2025, 2:14 PM
    Hi team, Do you have example or use case already, how kedro working with Label Studio? #C03RKP2LW64
    r
    • 2
    • 3
  • m

    Mohamed El Guendouz

    10/24/2025, 3:32 PM
    Hi everyone ! 🙂 I’m facing an issue with Kedro-Viz. I have a node that performs a merge into a Delta Table. In this node, I pass two inputs: • the dataframe to be inserted, and • the destination Delta Table itself. Inside the node, I execute the merge logic directly. The problem is that Kedro-Viz treats the Delta Table as an input, whereas I’d like it to be represented as the output after the merge, so that the lineage is clearer and reflects the actual data flow. Is there a way to indicate which dataset is the true input and which one should be considered the final output in this kind of use case? Thanks for your help! 🙏
    👀 1
    r
    • 2
    • 18
  • a

    Ayushi

    10/27/2025, 6:35 AM
    Hi Everyone! Issue Description: Getting
    InterpolationResolutionError
    kedro run after adding custom resolvers to CONFIG_LOADER_ARGS in settings.py Kedro run works fine if I comment out the custom resolver in settings.py but if I try to run kedro via this resolver i get an error saying globals key not found, content of settings.py
    Copy code
    CONFIG_LOADER_ARGS = {
     "custom_resolvers" : {
    "Our_resolver": reference to resolver
    }
    Is it necessary to explicitly mention config patterns?so that it is able to find globals or configs correctly
    a
    • 2
    • 2