https://kedro.org/ logo
Join Slack
Powered by
# questions
  • a

    Adam

    07/25/2025, 8:46 PM
    docs.kedro.org/en/1.0.0/integrations-and-plugins/mlflow/ doesn't appear to mention any compatibility issues
    n
    • 2
    • 2
  • m

    Max Pardoe

    07/31/2025, 11:07 AM
    @Max Pardoe has left the channel
  • i

    Ilaria Sartori

    07/31/2025, 2:43 PM
    Hey team! I have a question regarding saving of results: is there a way to automatically create versioned outputs so that the same catalog entry will not overwrite the previous one, but create a new one with timestam in the name?
    r
    • 2
    • 2
  • y

    Yolan Honoré-Rougé

    08/01/2025, 1:24 PM
    Is there a recommended way to import a pipeline of my current project in a test folder? I'd like to run it manually for and end to end test,
    KedroSession().create().run()
    is far too encapsulated because I need some manual data injection, and
    from dedro.framework.peoject import pipelines
    does not work because I am not at the root of the project but in the test folder.
    m
    • 2
    • 4
  • y

    Yanni

    08/05/2025, 2:55 PM
    Hi! Is there a way to access variable of other pipelines with different namespace? For example: I have two pipelines. Output of one pipeline should be merged within Pipeline two. Both pipelines do have different namespaces. It works as expected when I use only one namespace:
    Copy code
    """
    This is a boilerplate pipeline 'data_aggregation'
    generated using Kedro 1.0.0
    """
    
    from kedro.pipeline import Node, Pipeline, node, pipeline  # noqa
    
    from .nodes import (
        add_source,
        dropna,
        merge,
        rename,
    )
    
    
    def create_pipeline(**kwargs) -> Pipeline:
        return transform_123(**kwargs) + transform_ABC(**kwargs)
    
    
    def transform_123(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["raw_input", "params:source_name"],
                    outputs="with_source",
                    name="add_source",
                ),
                node(
                    func=rename,
                    inputs=["with_source", "params:rename_mapper"],
                    outputs="renamed",
                ),
                node(
                    func=dropna,
                    inputs=["renamed", "params:dropna"],
                    outputs="no_na",
                ),
            ],
            namespace="namespace_123",
        )
        return pipeline_instance
    
    
    def transform_ABC(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["namespace_ABC.raw_input", "params:namespace_ABC.source_name"],
                    outputs="preprocessed",
                    name="add_source",
                ),
                node(
                    func=merge,
                    inputs=["preprocessed", "namespace_123.no_na"],
                    outputs="merged",
                    name="merge_it",
                ),
            ],
        )
        return pipeline_instance
    But as soon as I use another namespace kedro_viz won't show the correct input.
    Copy code
    """
    This is a boilerplate pipeline 'data_aggregation'
    generated using Kedro 1.0.0
    """
    
    from kedro.pipeline import Node, Pipeline, node, pipeline  # noqa
    
    from .nodes import (
        add_source,
        dropna,
        merge,
        rename,
    )
    
    
    def create_pipeline(**kwargs) -> Pipeline:
        return transform_123(**kwargs) + transform_ABC(**kwargs)
    
    
    def transform_123(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["raw_input", "params:source_name"],
                    outputs="with_source",
                    name="add_source",
                ),
                node(
                    func=rename,
                    inputs=["with_source", "params:rename_mapper"],
                    outputs="renamed",
                ),
                node(
                    func=dropna,
                    inputs=["renamed", "params:dropna"],
                    outputs="no_na",
                ),
            ],
            namespace="namespace_123",
        )
        return pipeline_instance
    
    
    def transform_ABC(**kwargs) -> Pipeline:
        pipeline_instance = Pipeline(
            [
                node(
                    func=add_source,
                    inputs=["raw_input", "params:source_name"],
                    outputs="preprocessed",
                    name="add_source",
                ),
                node(
                    func=merge,
                    inputs=["preprocessed", "namespace_123.no_na"],
                    outputs="merged",
                    name="merge_it",
                ),
            ],
            namespace="namespace_ABC"
        )
        return pipeline_instance
    r
    • 2
    • 2
  • f

    Fabian P

    08/07/2025, 8:10 AM
    Hello, I have a question regarding versioned datasets: I want to save two outputs of my pipeline as seperate versioned files, but both files should be in the same folder for each version. In my usecase, I want a plot to be right next to the model. Is that possible without hooky workarounds?
    d
    • 2
    • 4
  • j

    jeffrey

    08/07/2025, 4:03 PM
    Hello, I have a question, I am using kedro to develop machine learning models but we are migrating from Databricks to Azure Fabric, so I would like to know if someone have made a connection from kedro to azure fabric or is it better to use azure ml
    r
    • 2
    • 1
  • c

    Clément Franger

    08/08/2025, 6:20 PM
    Hello, I have trouble understanding how config patterns works for the OmegaConfigLoader. Based on the documentation : https://docs.kedro.org/en/1.0.0/configure/advanced_configuration/#how-to-ensure-non-default-configuration-files-get-loaded I have tried to do something similar, but whenever I specify agents and tasks in my Node it returns the error : ValueError: Pipeline input(s) {'agents', 'tasks'} not found in the DataCatalogWithCatalogCommandsMixin The documentation is not clear how to access agents and tasks dict in my node. Thanks for your help
    i
    m
    • 3
    • 4
  • t

    Thiago Valejo

    08/12/2025, 3:42 AM
    Hello everyone, I’m facing a very specific problem with kedro-mlflow. I don’t know exactly if this is the right forum, so feel free to point me to the another one. I’m failing to load the champion version of a wrapper SklearnPipeline model registered in MLFlow. I want to save many experiments to MLFlow and to be able to load the champion version for other downstream pipelines. My catalog.yml looks like this:
    Copy code
    model:
     type: kedro_mlflow.io.models.MlflowModelTrackingDataset
     flavor: mlflow.sklearn
     save_args:
      registered_model_name:model
    
    model_loader:
     type: kedro_mlflow.io.models.MlflowModelRegistryDataset
     flavor: mlflow.sklearn
     model_name: "model"
     alias: "champion"
    If I try to load the model in a new kedro session, it will demand a run_id. If I try to use the model_loader. It will complain that the wrapper SklearnPipeline object don’t have a run_id, giving this error message:
    Copy code
    DatasetError: Failed while loading data from dataset MlflowModelRegistryDataset(alias=champion, 
    flavor=mlflow.sklearn, model_name=model, 
    model_uri=models:/model@champion, pyfunc_workflow=python_model).
    'dict' object has no attribute 'run_id'
    Does any one of you have any idea how I could load the champion model?
    👀 1
    r
    r
    y
    • 4
    • 6
  • y

    Yanni

    08/12/2025, 12:59 PM
    Hi, is there any way to display layer_names to memorydataset via kedro viz? Something like info.processed: type: io.MemoryDataset metadata: kedro-viz: layer: processing
    👀 1
    r
    n
    • 3
    • 7
  • t

    Thiago Valejo

    08/12/2025, 3:14 PM
    Following up on my previous question, talking with Rashida I found that the problem is a little bit different, so I'm reposting: I’m failing to load the champion version of a wrapper SklearnPipeline model registered in MLFlow. I want to save many experiments to MLFlow and to be able to load the champion version for other downstream pipelines. My catalog.yml looks like this:
    Copy code
    model:
     type: kedro_mlflow.io.models.MlflowModelTrackingDataset
     flavor: mlflow.sklearn
     save_args:
      registered_model_name:model
    
    model_loader:
     type: kedro_mlflow.io.models.MlflowModelRegistryDataset
     flavor: mlflow.sklearn
     model_name: "model"
     alias: "champion"
    If I try to load the model in a new kedro session, it will demand a run_id. If I try to use the model_loader. It will complain that the model (the wrapper SklearnPipeline object) don’t have a metadata attribute, giving this error message:
    Copy code
    │ /opt/anaconda3/envs/topazDS_2/lib/python3.11/site-packages/kedro_mlflow/io/models/mlflow_model_r │
    │ egistry_dataset.py:98 in _load                                                                   │
    │                                                                                                  │
    │    95 │   │   # because the same run can be registered under several different names             │
    │    96 │   │   #  in the registry. See <https://github.com/Galileo-Galilei/kedro-mlflow/issues/5>   │
    │    97 │   │   import pdb; pdb.set_trace()                                                        │
    │ ❱  98 │   │   <http://self._logger.info|self._logger.info>(f"Loading model from run_id='{model.metadata.run_id}'")          │
    │    99 │   │   return model                                                                       │
    │   100 │                                                                                          │
    │   101 │   def _save(self, model: Any) -> None:                                                   │
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
    AttributeError: 'SklearnPipeline' object has no attribute 'metadata'
    
    DatasetError: Failed while loading data from dataset 
    kedro_mlflow.io.models.mlflow_model_registry_dataset.MlflowModelRegistryDataset(model_uri='models:/mill1_west_no_we
    nco_st_model@champion', model_name='mill1_west_no_wenco_st_model', alias='champion', flavor='mlflow.sklearn', 
    pyfunc_workflow='python_model').
    'SklearnPipeline' object has no attribute 'metadata'
    I think that the MlflowModelRegistryDataset class wasn't expecting the model to be a sklearn object. Probably there's a difference in how I'm saving the model (MlflowModelTrackingDataset) and how I'm loading it (MlflowModelRegistryDataset). How I could load the champion model? @Rashida Kanchwala @Ravi Kumar Pilla
    👀 1
    r
    y
    • 3
    • 16
  • j

    jeffrey

    08/12/2025, 4:46 PM
    Hello, I have a question. I want to run an experiment with kedro version 0.19.11 and use the kedro-azureml plugin, but it shows me conflicts due to the kedro version. So my question is just Can it be used with kedro version < 0.19.x?
    👀 2
    r
    • 2
    • 6
  • s

    Sen

    08/13/2025, 2:46 AM
    Hi everyone. I was able to run Kedro Viz but I thought it would be nice if I can color coded by layer. Is there any way to do that?
    r
    • 2
    • 2
  • j

    Jamal Sealiti

    08/19/2025, 10:40 AM
    Hi, I have a question regarding best practices for deploying a Kedro project in a distributed environment. Currently, I have Kedro running inside a container with the following Spark configuration: •
    spark.submit.deployMode = "cluster"
    •
    spark.master = "yarn"
    My goal is to run this setup within a datafabric. However, I came across a discussion online stating that Kedro internally uses the PySpark shell to instantiate the
    SparkSession
    , which is incompatible with YARN's cluster deploy mode. As cluster mode requires
    spark-submit
    rather than interactive shells, this presents a challenge. A suggested workaround involves: • Packaging the Kedro project as a Python wheel (
    .whl
    ) or zip archive. • Using
    spark-submit
    to deploy the packaged project to the cluster. But this workround maybe avoiding dependency issues... Do you have any recommendations or best practices for this deployment approach? Is there a more streamlined way to integrate Kedro with Spark in cluster mode within a datafabric context?
    👀 2
    h
    n
    • 3
    • 5
  • a

    Arnaud Dhaene

    08/25/2025, 4:34 PM
    Hi everyone, I'm working with a cloud platform that only accepts a
    python
    entrypoint when setting up a workflow-type job Is there an elegant / intuitive way to run my Kedro project from the command-line using
    python -m <something> run <pipeline> ...
    ? Perhaps there is a way to bootstrap Kedro in a light-weight wrapper?
    d
    • 2
    • 2
  • f

    Fazil Topal

    08/26/2025, 1:52 PM
    hey everyone, I have a problem with kedro logging. (conf/logging.yml)
    Copy code
    # To enable this custom logging configuration, set KEDRO_LOGGING_CONFIG to the path of this file.
    # More information available at <https://docs.kedro.org/en/stable/logging/logging.html>
    version: 1
    
    disable_existing_loggers: False
    
    formatters:
      simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    
    handlers:
      console:
        class: logging.StreamHandler
        level: INFO
        formatter: simple
        stream: <ext://sys.stdout>
    
      info_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: info.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True
    
      rich:
        class: kedro.logging.RichHandler
        rich_tracebacks: True
        # Advance options for customisation.
        # See <https://docs.kedro.org/en/stable/logging/logging.html#project-side-logging-configuration>
        # tracebacks_show_locals: False
    
    loggers:
      kedro:
        level: INFO
    
      text2shots:
        level: INFO
    
    root:
      handlers: [rich]
    According to documentation, unless i define the
    KEDRO_LOGGING_CONFIG
    the default will be used (which points to here: https://github.com/kedro-org/kedro/blob/main/kedro/framework/project/default_logging.yml) 1- When i run kedro, i see the logging saying it will use my file by default (it picks it up automatically which is fine) 2- When my code fails, i can't see the tracebacks properly. After some hours spent, i found the issue (which is the full traceback):
    Copy code
    File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 103, in convert_messages
        converted_messages = [_convert_single_message(msg) for msg in messages]
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 103, in <listcomp>
        converted_messages = [_convert_single_message(msg) for msg in messages]
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 65, in _convert_single_message
        for tool_call in msg["tool_calls"]:
    TypeError: 'NoneType' object is not iterable
    but kedro only shows the last part to me
    TypeError: 'NoneType' object is not iterable
    and does not even mention the file/line number so it's incredibly hard for me to understand where this is coming from. I am using kedro version: 0.19.12. How can i enable this so i get full error tracebacks without losing them?
    j
    • 2
    • 4
  • j

    Jean Plumail

    08/26/2025, 4:04 PM
    Hi everyone,
  • g

    Galen Seilis

    08/26/2025, 9:04 PM
    I am looking at the Ibis TableDataset. The example in the docs points to a db file, but in my case I would want to connect to a remote TSQL database. Do I provide a connection string the same way I would with a Pandas SQL Table dataset? https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-8.1.0/api/kedro_datasets/ibis.TableDataset/#kedro_datasets.ibis.TableDataset
    d
    • 2
    • 3
  • p

    Pascal Brokmeier

    08/27/2025, 6:47 AM
    Hi friends. We're wondering if we should upgrade our kedro -> argo manual jinja templating kung-fu by leveraging https://hera-workflows.readthedocs.io/ Has anyone thought about this as well? It could be a nice way to give kedro a stable go-to codebase to deploy kedro to k8s clusters
    m
    n
    • 3
    • 3
  • g

    Gauthier Pierard

    09/01/2025, 3:47 PM
    hey ! has anyone managed to use files on the databricks filesystem as a data source? getting DatasetError: No partitions found in '/dbfs/FileStore/myproject/queries/' but the files are there
    n
    • 2
    • 8
  • p

    Paul Haakma

    09/01/2025, 9:05 PM
    Hi all. Can anyone advise on the best way to resolve a relative path to absolute? I want to specify relative paths in parameters, such as 'data/01_raw/myfile', but have a particular tool that requires passing it an absolute path. Does Kedro have a method to resolve or how to get the absolute project path perhaps?
    a
    d
    • 3
    • 6
  • p

    Paul Haakma

    09/02/2025, 5:31 AM
    Hi all. If I have a kedro project with two pipelines, i.e. A and B, what is the best way to ensure that A always runs before B, but then allow me to manually run B independently if required? I have tried just setting the output of the last node in A as the input of the first node in B. But then if I try to run just B from the CLI, I get an error like so: ValueError: Pipeline input(s) {'d'} not found in the DataCatalogWithCatalogCommandsMixin I can't figure out a way to manually give it that first input, or tell it to ignore it somehow.
    a
    • 2
    • 3
  • n

    Nikola Miszalska

    09/03/2025, 8:22 AM
    Hi:) I have 3 different pipelines and 2 different catalogs: catalog.yaml for the first and second pipeline and catalog_p3.yaml for the third pipeline. Is there any way to access catalog filename from e.g. KedroContext durng the pipeline run? E.g. if i run third pipeline i want to access filename "catalog_p3.yaml" to be able to log this file into mlflow. Is there any way to force pipeline p3 to use only catalog_p3.yaml instead of merging it with catalog.yaml, which is used by 1 and 2 pipeline?
    a
    n
    • 3
    • 2
  • l

    Leonardo David Treiger Herszenhaut Brettas

    09/08/2025, 4:00 AM
    how to define a table schema using databricks.ManagedTableDataset? someone knows?
    j
    • 2
    • 3
  • v

    Víctor Alejandro Hernández Martínez

    09/09/2025, 7:24 PM
    Hello everyone. I have a question related to a particular use case and its best practices. I'm building different pipelines designed to follow the classic lifecycle of building a model, from preprocessing the training data to fine-tuning the model and evaluating its results. However, I'm now concerned about the case in which I intend to use the model to evaluate new subjects. In particular, this scenario has the following characteristics: - First, the data doesn't arrive en masse, nor is it expected to do so at any point. The cases to be evaluated are limited. - The prediction is generated on-demand and asynchronously (it's understood that for some cases, preprocessing may take time, so the associated routine is executed in a parallel task for the user). - The data would come from a server other than the one where Kedro would be running. Given this, what would be the most recommended complementary tools to serve the model and its results? What would be the most appropriate functional architecture? I have tools like Airflow at my disposal, but I'm not sure if that's enough, if I should use another tool to set up an API, if Kedro alone is enough to do it all, etc. The possibilities are endless, but I want to avoid "rebuilding the wheel" as much as possible. Any recommendations are welcome. Thanks in advance.
    n
    • 2
    • 5
  • l

    Laure Vancau

    09/11/2025, 1:41 PM
    hello 🙂 I would like to use your spark.SparkStreamingDataSet with a Kafka integration. Due to project constraints, I am working with Kedro 0.19.14, and the following jars (
    spark.jars.packages: org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.4,org.apache.spark:spark-token-provider-kafka-0-10_2.12:3.2.4
    ) From what i have understood, my dataset definition in the catalogue should be something like this:
    Copy code
    data:
      type: spark.SparkStreamingDataSet
      file_format: kafka
      load_args:
        options:
          subscribe: my-topic
          kafka.bootstrap.servers: kafka:0000
          startingOffsets: earliest
    however, I cannot navigate around the error :
    Copy code
    DatasetError: Failed while loading data from data set 
    SparkStreamingDataset(file_format=kafka, filepath=., load_args={'options': 
    {'kafka.bootstrap.servers': kafka:0000, 'startingOffsets': earliest, 
    'subscribe': my-topic}}, save_args={}).
    schema should be StructType or string
    Would you have any example projects or extra docs to point me to ? Thanks a bunch 😊
    j
    d
    • 3
    • 16
  • s

    Shohatl

    09/15/2025, 5:07 AM
    Hello everyone, Is there a way to pass a variable to the create_pipeline() function? My use case is initializing several kedrosessions in a single run where each session will run an etl on each table. The erl can change according to the input. For example if I have several inputs for an etl, I want to dynamically initialize a node per input that loads the table. Is there a way to dynamically create a pipeline based on the input? The input is available only at runtime so I can't use the parameters.yaml Also I wanted to know what is the best practice to run multiple pipelines in parallel. I am currently creating a session per pipeline and running it in a separate thread
    👀 1
    e
    • 2
    • 16
  • j

    Júlio Resende

    09/15/2025, 5:00 PM
    Hello everyone! I'm trying to use SparkDataset to read and write to the Azure Datalake File System, using the abfs:// prefix. I noticed that, although the dataset requires credentials to be passed in the init method, these credentials are not used when writing, requiring the Spark section to be configured globally. This seems a bit out of line with the Kedro standard, as it doesn't allow us to have datasets from multiple sources. Shouldn't we be using these credentials directly when writing and reading, without using the global Spark configuration?
    e
    n
    • 3
    • 4
  • r

    Ralf Kowatsch

    09/18/2025, 9:53 AM
    Hi, our team is new to kedro and we would like to use it as a data engineering tool. The concerns we have are • If we work with ibis or snowpark, we don't want to define each table/view on the database. As far as I understand the DataSets are the persistance objects that connect the different transformations in the pipeline. It there a way to get around defining these? • How many nodes could we run in parallel ? Is there an uper limit if the heavy computing is mainly happening on snowflake? • I understand it that the nodes/transformations have to be molded into a pipeline. Is there an option to do that implicily by referencing another node? • Is there a proper way to solve data quality inclunding generic tests, custom tests • Is there an example project that we could benefit from? Thanks for your inputs
    extreme teamwork 1
    d
    • 2
    • 1
  • e

    Emil Marcetta

    09/18/2025, 7:05 PM
    Hi, we are migrating to v1.0 some old v.18 pipelines and I have a question about parameters.yml and runtime_params. (we run into an issue if not receiving a runtime_params:begin_date) We reference runtime_params in catalog, as an example
    Copy code
    ...
    filepath: "${globals:example_bucket}/i/j/k/date=${runtime_params:begin_date}/
    only the base/parameters.yml has a definition for begin_date
    Copy code
    begin_date: "2025-01-01"
    and the CONFIG_LOADER_ARGS (in settings.py) does not have an entry for config_patterns. (stepping with a debugger in OmegaConfigLoader constructor initialization confirms the “parameters” patterns are present). and we invoke the pipeline as no params (kedro run). The error we receive when catalog loads is:
    Copy code
    InterpolationResolutionError: Runtime parameter 'begin_date' not found and no default value provided.
    Thank you!
    👀 1
    r
    n
    • 3
    • 23
1...2728293031Latest