https://kedro.org/ logo
Join Slack
Powered by
# questions
  • g

    Galen Seilis

    07/23/2025, 10:03 PM
    I just want to double check something. I 'think' when a node function accesses the parameters that it only uses a copy of the parameters. Is that correct? I was tinkering with Kedro v1.0.0 and I was not able to change the value of a given parameter. IMO it is desirable for these node functions to be unable to modify the parameters.
    l
    n
    • 3
    • 3
  • f

    Felipe Monroy

    07/25/2025, 3:12 AM
    Hello! Is there a way to modify a node’s inputs using hooks? I’m not sure if this is the best approach, but I need to perform some operations on AWS Personalize using boto3, so my nodes will require the Personalize client as an input. Ideally, I’d like to inject the client rather than initialize it separately within each node.
    l
    • 2
    • 3
  • f

    Filip Isak Mattsson

    07/25/2025, 10:12 AM
    Hello and happy Friday, quick question: When I upgraded from 0.19.14 to 1.0.0. It seems like the DataCatalog.add() was removed. This causes problems with how I create a hook with an Abstract Dataclass wrapper for a snowpark session. What is the new way to handle this? 🙂 All help welcome. Nerver mind. Fixed by using the dictionary assignment haha. Feel free to remove or leave it up for future references
  • a

    Adam

    07/25/2025, 8:43 PM
    Kedro v1.0.0 is looking 🔥But I just tried installing
    kedro-mlflow
    and it uninstalled v1 and re-installed v0.19 - will it be updated to use v1 at some point?
    y
    • 2
    • 2
  • a

    Adam

    07/25/2025, 8:46 PM
    docs.kedro.org/en/1.0.0/integrations-and-plugins/mlflow/ doesn't appear to mention any compatibility issues
    n
    • 2
    • 2
  • m

    Max Pardoe

    07/31/2025, 11:07 AM
    @Max Pardoe has left the channel
  • i

    Ilaria Sartori

    07/31/2025, 2:43 PM
    Hey team! I have a question regarding saving of results: is there a way to automatically create versioned outputs so that the same catalog entry will not overwrite the previous one, but create a new one with timestam in the name?
    r
    • 2
    • 2
  • y

    Yolan Honoré-Rougé

    08/01/2025, 1:24 PM
    Is there a recommended way to import a pipeline of my current project in a test folder? I'd like to run it manually for and end to end test,
    KedroSession().create().run()
    is far too encapsulated because I need some manual data injection, and
    from dedro.framework.peoject import pipelines
    does not work because I am not at the root of the project but in the test folder.
    m
    • 2
    • 4
  • y

    Yanni

    08/05/2025, 2:55 PM
    Hi! Is there a way to access variable of other pipelines with different namespace? For example: I have two pipelines. Output of one pipeline should be merged within Pipeline two. Both pipelines do have different namespaces. It works as expected when I use only one namespace:
    Copy code
    """
    This is a boilerplate pipeline 'data_aggregation'
    generated using Kedro 1.0.0
    """
    
    from kedro.pipeline import Node, Pipeline, node, pipeline  # noqa
    
    from .nodes import (
        add_source,
        dropna,
        merge,
        rename,
    )
    
    
    def create_pipeline(**kwargs) -> Pipeline:
        return transform_123(**kwargs) + transform_ABC(**kwargs)
    
    
    def transform_123(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["raw_input", "params:source_name"],
                    outputs="with_source",
                    name="add_source",
                ),
                node(
                    func=rename,
                    inputs=["with_source", "params:rename_mapper"],
                    outputs="renamed",
                ),
                node(
                    func=dropna,
                    inputs=["renamed", "params:dropna"],
                    outputs="no_na",
                ),
            ],
            namespace="namespace_123",
        )
        return pipeline_instance
    
    
    def transform_ABC(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["namespace_ABC.raw_input", "params:namespace_ABC.source_name"],
                    outputs="preprocessed",
                    name="add_source",
                ),
                node(
                    func=merge,
                    inputs=["preprocessed", "namespace_123.no_na"],
                    outputs="merged",
                    name="merge_it",
                ),
            ],
        )
        return pipeline_instance
    But as soon as I use another namespace kedro_viz won't show the correct input.
    Copy code
    """
    This is a boilerplate pipeline 'data_aggregation'
    generated using Kedro 1.0.0
    """
    
    from kedro.pipeline import Node, Pipeline, node, pipeline  # noqa
    
    from .nodes import (
        add_source,
        dropna,
        merge,
        rename,
    )
    
    
    def create_pipeline(**kwargs) -> Pipeline:
        return transform_123(**kwargs) + transform_ABC(**kwargs)
    
    
    def transform_123(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["raw_input", "params:source_name"],
                    outputs="with_source",
                    name="add_source",
                ),
                node(
                    func=rename,
                    inputs=["with_source", "params:rename_mapper"],
                    outputs="renamed",
                ),
                node(
                    func=dropna,
                    inputs=["renamed", "params:dropna"],
                    outputs="no_na",
                ),
            ],
            namespace="namespace_123",
        )
        return pipeline_instance
    
    
    def transform_ABC(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["raw_input", "params:source_name"],
                    outputs="preprocessed",
                    name="add_source",
                ),
                node(
                    func=merge,
                    inputs=["preprocessed", "namespace_123.no_na"],
                    outputs="merged",
                    name="merge_it",
                ),
            ],
            namespace="namespace_ABC"
        )
        return pipeline_instance
    r
    • 2
    • 2
  • f

    Fabian P

    08/07/2025, 8:10 AM
    Hello, I have a question regarding versioned datasets: I want to save two outputs of my pipeline as seperate versioned files, but both files should be in the same folder for each version. In my usecase, I want a plot to be right next to the model. Is that possible without hooky workarounds?
    d
    • 2
    • 4
  • j

    jeffrey

    08/07/2025, 4:03 PM
    Hello, I have a question, I am using kedro to develop machine learning models but we are migrating from Databricks to Azure Fabric, so I would like to know if someone have made a connection from kedro to azure fabric or is it better to use azure ml
    r
    • 2
    • 1
  • c

    Clément Franger

    08/08/2025, 6:20 PM
    Hello, I have trouble understanding how config patterns works for the OmegaConfigLoader. Based on the documentation : https://docs.kedro.org/en/1.0.0/configure/advanced_configuration/#how-to-ensure-non-default-configuration-files-get-loaded I have tried to do something similar, but whenever I specify agents and tasks in my Node it returns the error : ValueError: Pipeline input(s) {'agents', 'tasks'} not found in the DataCatalogWithCatalogCommandsMixin The documentation is not clear how to access agents and tasks dict in my node. Thanks for your help
    i
    m
    • 3
    • 4
  • t

    Thiago Valejo

    08/12/2025, 3:42 AM
    Hello everyone, I’m facing a very specific problem with kedro-mlflow. I don’t know exactly if this is the right forum, so feel free to point me to the another one. I’m failing to load the champion version of a wrapper SklearnPipeline model registered in MLFlow. I want to save many experiments to MLFlow and to be able to load the champion version for other downstream pipelines. My catalog.yml looks like this:
    Copy code
    model:
     type: kedro_mlflow.io.models.MlflowModelTrackingDataset
     flavor: mlflow.sklearn
     save_args:
      registered_model_name:model
    
    model_loader:
     type: kedro_mlflow.io.models.MlflowModelRegistryDataset
     flavor: mlflow.sklearn
     model_name: "model"
     alias: "champion"
    If I try to load the model in a new kedro session, it will demand a run_id. If I try to use the model_loader. It will complain that the wrapper SklearnPipeline object don’t have a run_id, giving this error message:
    Copy code
    DatasetError: Failed while loading data from dataset MlflowModelRegistryDataset(alias=champion, 
    flavor=mlflow.sklearn, model_name=model, 
    model_uri=models:/model@champion, pyfunc_workflow=python_model).
    'dict' object has no attribute 'run_id'
    Does any one of you have any idea how I could load the champion model?
    👀 1
    r
    r
    y
    • 4
    • 6
  • y

    Yanni

    08/12/2025, 12:59 PM
    Hi, is there any way to display layer_names to memorydataset via kedro viz? Something like info.processed: type: io.MemoryDataset metadata: kedro-viz: layer: processing
    👀 1
    r
    n
    • 3
    • 7
  • t

    Thiago Valejo

    08/12/2025, 3:14 PM
    Following up on my previous question, talking with Rashida I found that the problem is a little bit different, so I'm reposting: I’m failing to load the champion version of a wrapper SklearnPipeline model registered in MLFlow. I want to save many experiments to MLFlow and to be able to load the champion version for other downstream pipelines. My catalog.yml looks like this:
    Copy code
    model:
     type: kedro_mlflow.io.models.MlflowModelTrackingDataset
     flavor: mlflow.sklearn
     save_args:
      registered_model_name:model
    
    model_loader:
     type: kedro_mlflow.io.models.MlflowModelRegistryDataset
     flavor: mlflow.sklearn
     model_name: "model"
     alias: "champion"
    If I try to load the model in a new kedro session, it will demand a run_id. If I try to use the model_loader. It will complain that the model (the wrapper SklearnPipeline object) don’t have a metadata attribute, giving this error message:
    Copy code
    │ /opt/anaconda3/envs/topazDS_2/lib/python3.11/site-packages/kedro_mlflow/io/models/mlflow_model_r │
    │ egistry_dataset.py:98 in _load                                                                   │
    │                                                                                                  │
    │    95 │   │   # because the same run can be registered under several different names             │
    │    96 │   │   #  in the registry. See <https://github.com/Galileo-Galilei/kedro-mlflow/issues/5>   │
    │    97 │   │   import pdb; pdb.set_trace()                                                        │
    │ ❱  98 │   │   <http://self._logger.info|self._logger.info>(f"Loading model from run_id='{model.metadata.run_id}'")          │
    │    99 │   │   return model                                                                       │
    │   100 │                                                                                          │
    │   101 │   def _save(self, model: Any) -> None:                                                   │
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
    AttributeError: 'SklearnPipeline' object has no attribute 'metadata'
    
    DatasetError: Failed while loading data from dataset 
    kedro_mlflow.io.models.mlflow_model_registry_dataset.MlflowModelRegistryDataset(model_uri='models:/mill1_west_no_we
    nco_st_model@champion', model_name='mill1_west_no_wenco_st_model', alias='champion', flavor='mlflow.sklearn', 
    pyfunc_workflow='python_model').
    'SklearnPipeline' object has no attribute 'metadata'
    I think that the MlflowModelRegistryDataset class wasn't expecting the model to be a sklearn object. Probably there's a difference in how I'm saving the model (MlflowModelTrackingDataset) and how I'm loading it (MlflowModelRegistryDataset). How I could load the champion model? @Rashida Kanchwala @Ravi Kumar Pilla
    👀 1
    r
    y
    • 3
    • 16
  • j

    jeffrey

    08/12/2025, 4:46 PM
    Hello, I have a question. I want to run an experiment with kedro version 0.19.11 and use the kedro-azureml plugin, but it shows me conflicts due to the kedro version. So my question is just Can it be used with kedro version < 0.19.x?
    👀 2
    r
    • 2
    • 6
  • s

    Sen

    08/13/2025, 2:46 AM
    Hi everyone. I was able to run Kedro Viz but I thought it would be nice if I can color coded by layer. Is there any way to do that?
    r
    • 2
    • 2
  • j

    Jamal Sealiti

    08/19/2025, 10:40 AM
    Hi, I have a question regarding best practices for deploying a Kedro project in a distributed environment. Currently, I have Kedro running inside a container with the following Spark configuration: •
    spark.submit.deployMode = "cluster"
    •
    spark.master = "yarn"
    My goal is to run this setup within a datafabric. However, I came across a discussion online stating that Kedro internally uses the PySpark shell to instantiate the
    SparkSession
    , which is incompatible with YARN's cluster deploy mode. As cluster mode requires
    spark-submit
    rather than interactive shells, this presents a challenge. A suggested workaround involves: • Packaging the Kedro project as a Python wheel (
    .whl
    ) or zip archive. • Using
    spark-submit
    to deploy the packaged project to the cluster. But this workround maybe avoiding dependency issues... Do you have any recommendations or best practices for this deployment approach? Is there a more streamlined way to integrate Kedro with Spark in cluster mode within a datafabric context?
    👀 2
    h
    n
    • 3
    • 5
  • a

    Arnaud Dhaene

    08/25/2025, 4:34 PM
    Hi everyone, I'm working with a cloud platform that only accepts a
    python
    entrypoint when setting up a workflow-type job Is there an elegant / intuitive way to run my Kedro project from the command-line using
    python -m <something> run <pipeline> ...
    ? Perhaps there is a way to bootstrap Kedro in a light-weight wrapper?
    d
    • 2
    • 2
  • f

    Fazil Topal

    08/26/2025, 1:52 PM
    hey everyone, I have a problem with kedro logging. (conf/logging.yml)
    Copy code
    # To enable this custom logging configuration, set KEDRO_LOGGING_CONFIG to the path of this file.
    # More information available at <https://docs.kedro.org/en/stable/logging/logging.html>
    version: 1
    
    disable_existing_loggers: False
    
    formatters:
      simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    
    handlers:
      console:
        class: logging.StreamHandler
        level: INFO
        formatter: simple
        stream: <ext://sys.stdout>
    
      info_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: info.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True
    
      rich:
        class: kedro.logging.RichHandler
        rich_tracebacks: True
        # Advance options for customisation.
        # See <https://docs.kedro.org/en/stable/logging/logging.html#project-side-logging-configuration>
        # tracebacks_show_locals: False
    
    loggers:
      kedro:
        level: INFO
    
      text2shots:
        level: INFO
    
    root:
      handlers: [rich]
    According to documentation, unless i define the
    KEDRO_LOGGING_CONFIG
    the default will be used (which points to here: https://github.com/kedro-org/kedro/blob/main/kedro/framework/project/default_logging.yml) 1- When i run kedro, i see the logging saying it will use my file by default (it picks it up automatically which is fine) 2- When my code fails, i can't see the tracebacks properly. After some hours spent, i found the issue (which is the full traceback):
    Copy code
    File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 103, in convert_messages
        converted_messages = [_convert_single_message(msg) for msg in messages]
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 103, in <listcomp>
        converted_messages = [_convert_single_message(msg) for msg in messages]
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 65, in _convert_single_message
        for tool_call in msg["tool_calls"]:
    TypeError: 'NoneType' object is not iterable
    but kedro only shows the last part to me
    TypeError: 'NoneType' object is not iterable
    and does not even mention the file/line number so it's incredibly hard for me to understand where this is coming from. I am using kedro version: 0.19.12. How can i enable this so i get full error tracebacks without losing them?
    j
    • 2
    • 4
  • j

    Jean Plumail

    08/26/2025, 4:04 PM
    Hi everyone,
  • g

    Galen Seilis

    08/26/2025, 9:04 PM
    I am looking at the Ibis TableDataset. The example in the docs points to a db file, but in my case I would want to connect to a remote TSQL database. Do I provide a connection string the same way I would with a Pandas SQL Table dataset? https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-8.1.0/api/kedro_datasets/ibis.TableDataset/#kedro_datasets.ibis.TableDataset
    d
    • 2
    • 3
  • p

    Pascal Brokmeier

    08/27/2025, 6:47 AM
    Hi friends. We're wondering if we should upgrade our kedro -> argo manual jinja templating kung-fu by leveraging https://hera-workflows.readthedocs.io/ Has anyone thought about this as well? It could be a nice way to give kedro a stable go-to codebase to deploy kedro to k8s clusters
    m
    n
    • 3
    • 3
  • g

    Gauthier Pierard

    09/01/2025, 3:47 PM
    hey ! has anyone managed to use files on the databricks filesystem as a data source? getting DatasetError: No partitions found in '/dbfs/FileStore/myproject/queries/' but the files are there
    n
    • 2
    • 8
  • p

    Paul Haakma

    09/01/2025, 9:05 PM
    Hi all. Can anyone advise on the best way to resolve a relative path to absolute? I want to specify relative paths in parameters, such as 'data/01_raw/myfile', but have a particular tool that requires passing it an absolute path. Does Kedro have a method to resolve or how to get the absolute project path perhaps?
    a
    d
    • 3
    • 6
  • p

    Paul Haakma

    09/02/2025, 5:31 AM
    Hi all. If I have a kedro project with two pipelines, i.e. A and B, what is the best way to ensure that A always runs before B, but then allow me to manually run B independently if required? I have tried just setting the output of the last node in A as the input of the first node in B. But then if I try to run just B from the CLI, I get an error like so: ValueError: Pipeline input(s) {'d'} not found in the DataCatalogWithCatalogCommandsMixin I can't figure out a way to manually give it that first input, or tell it to ignore it somehow.
    a
    • 2
    • 3
  • n

    Nikola Miszalska

    09/03/2025, 8:22 AM
    Hi:) I have 3 different pipelines and 2 different catalogs: catalog.yaml for the first and second pipeline and catalog_p3.yaml for the third pipeline. Is there any way to access catalog filename from e.g. KedroContext durng the pipeline run? E.g. if i run third pipeline i want to access filename "catalog_p3.yaml" to be able to log this file into mlflow. Is there any way to force pipeline p3 to use only catalog_p3.yaml instead of merging it with catalog.yaml, which is used by 1 and 2 pipeline?
    a
    n
    • 3
    • 2
  • l

    Leonardo David Treiger Herszenhaut Brettas

    09/08/2025, 4:00 AM
    how to define a table schema using databricks.ManagedTableDataset? someone knows?
    j
    • 2
    • 3
  • v

    Víctor Alejandro Hernández Martínez

    09/09/2025, 7:24 PM
    Hello everyone. I have a question related to a particular use case and its best practices. I'm building different pipelines designed to follow the classic lifecycle of building a model, from preprocessing the training data to fine-tuning the model and evaluating its results. However, I'm now concerned about the case in which I intend to use the model to evaluate new subjects. In particular, this scenario has the following characteristics: - First, the data doesn't arrive en masse, nor is it expected to do so at any point. The cases to be evaluated are limited. - The prediction is generated on-demand and asynchronously (it's understood that for some cases, preprocessing may take time, so the associated routine is executed in a parallel task for the user). - The data would come from a server other than the one where Kedro would be running. Given this, what would be the most recommended complementary tools to serve the model and its results? What would be the most appropriate functional architecture? I have tools like Airflow at my disposal, but I'm not sure if that's enough, if I should use another tool to set up an API, if Kedro alone is enough to do it all, etc. The possibilities are endless, but I want to avoid "rebuilding the wheel" as much as possible. Any recommendations are welcome. Thanks in advance.
    n
    • 2
    • 5
  • l

    Laure Vancau

    09/11/2025, 1:41 PM
    hello 🙂 I would like to use your spark.SparkStreamingDataSet with a Kafka integration. Due to project constraints, I am working with Kedro 0.19.14, and the following jars (
    spark.jars.packages: org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.4,org.apache.spark:spark-token-provider-kafka-0-10_2.12:3.2.4
    ) From what i have understood, my dataset definition in the catalogue should be something like this:
    Copy code
    data:
      type: spark.SparkStreamingDataSet
      file_format: kafka
      load_args:
        options:
          subscribe: my-topic
          kafka.bootstrap.servers: kafka:0000
          startingOffsets: earliest
    however, I cannot navigate around the error :
    Copy code
    DatasetError: Failed while loading data from data set 
    SparkStreamingDataset(file_format=kafka, filepath=., load_args={'options': 
    {'kafka.bootstrap.servers': kafka:0000, 'startingOffsets': earliest, 
    'subscribe': my-topic}}, save_args={}).
    schema should be StructType or string
    Would you have any example projects or extra docs to point me to ? Thanks a bunch 😊
    j
    d
    • 3
    • 7