Galen Seilis
07/23/2025, 10:03 PMFelipe Monroy
07/25/2025, 3:12 AMFilip Isak Mattsson
07/25/2025, 10:12 AMAdam
07/25/2025, 8:43 PMkedro-mlflow
and it uninstalled v1 and re-installed v0.19 - will it be updated to use v1 at some point?Adam
07/25/2025, 8:46 PMMax Pardoe
07/31/2025, 11:07 AMIlaria Sartori
07/31/2025, 2:43 PMYolan Honoré-Rougé
08/01/2025, 1:24 PMKedroSession().create().run()
is far too encapsulated because I need some manual data injection, and from dedro.framework.peoject import pipelines
does not work because I am not at the root of the project but in the test folder.Yanni
08/05/2025, 2:55 PM"""
This is a boilerplate pipeline 'data_aggregation'
generated using Kedro 1.0.0
"""
from kedro.pipeline import Node, Pipeline, node, pipeline # noqa
from .nodes import (
add_source,
dropna,
merge,
rename,
)
def create_pipeline(**kwargs) -> Pipeline:
return transform_123(**kwargs) + transform_ABC(**kwargs)
def transform_123(**kwargs) -> Pipeline:
pipeline_instance = Pipeline(
[
node(
func=add_source,
inputs=["raw_input", "params:source_name"],
outputs="with_source",
name="add_source",
),
node(
func=rename,
inputs=["with_source", "params:rename_mapper"],
outputs="renamed",
),
node(
func=dropna,
inputs=["renamed", "params:dropna"],
outputs="no_na",
),
],
namespace="namespace_123",
)
return pipeline_instance
def transform_ABC(**kwargs) -> Pipeline:
pipeline_instance = Pipeline(
[
node(
func=add_source,
inputs=["namespace_ABC.raw_input", "params:namespace_ABC.source_name"],
outputs="preprocessed",
name="add_source",
),
node(
func=merge,
inputs=["preprocessed", "namespace_123.no_na"],
outputs="merged",
name="merge_it",
),
],
)
return pipeline_instance
But as soon as I use another namespace kedro_viz won't show the correct input.
"""
This is a boilerplate pipeline 'data_aggregation'
generated using Kedro 1.0.0
"""
from kedro.pipeline import Node, Pipeline, node, pipeline # noqa
from .nodes import (
add_source,
dropna,
merge,
rename,
)
def create_pipeline(**kwargs) -> Pipeline:
return transform_123(**kwargs) + transform_ABC(**kwargs)
def transform_123(**kwargs) -> Pipeline:
pipeline_instance = Pipeline(
[
node(
func=add_source,
inputs=["raw_input", "params:source_name"],
outputs="with_source",
name="add_source",
),
node(
func=rename,
inputs=["with_source", "params:rename_mapper"],
outputs="renamed",
),
node(
func=dropna,
inputs=["renamed", "params:dropna"],
outputs="no_na",
),
],
namespace="namespace_123",
)
return pipeline_instance
def transform_ABC(**kwargs) -> Pipeline:
pipeline_instance = Pipeline(
[
node(
func=add_source,
inputs=["raw_input", "params:source_name"],
outputs="preprocessed",
name="add_source",
),
node(
func=merge,
inputs=["preprocessed", "namespace_123.no_na"],
outputs="merged",
name="merge_it",
),
],
namespace="namespace_ABC"
)
return pipeline_instance
Fabian P
08/07/2025, 8:10 AMjeffrey
08/07/2025, 4:03 PMClément Franger
08/08/2025, 6:20 PMThiago Valejo
08/12/2025, 3:42 AMmodel:
type: kedro_mlflow.io.models.MlflowModelTrackingDataset
flavor: mlflow.sklearn
save_args:
registered_model_name:model
model_loader:
type: kedro_mlflow.io.models.MlflowModelRegistryDataset
flavor: mlflow.sklearn
model_name: "model"
alias: "champion"
If I try to load the model in a new kedro session, it will demand a run_id. If I try to use the model_loader. It will complain that the wrapper SklearnPipeline object don’t have a run_id, giving this error message:
DatasetError: Failed while loading data from dataset MlflowModelRegistryDataset(alias=champion,
flavor=mlflow.sklearn, model_name=model,
model_uri=models:/model@champion, pyfunc_workflow=python_model).
'dict' object has no attribute 'run_id'
Does any one of you have any idea how I could load the champion model?Yanni
08/12/2025, 12:59 PMThiago Valejo
08/12/2025, 3:14 PMmodel:
type: kedro_mlflow.io.models.MlflowModelTrackingDataset
flavor: mlflow.sklearn
save_args:
registered_model_name:model
model_loader:
type: kedro_mlflow.io.models.MlflowModelRegistryDataset
flavor: mlflow.sklearn
model_name: "model"
alias: "champion"
If I try to load the model in a new kedro session, it will demand a run_id. If I try to use the model_loader. It will complain that the model (the wrapper SklearnPipeline object) don’t have a metadata attribute, giving this error message:
│ /opt/anaconda3/envs/topazDS_2/lib/python3.11/site-packages/kedro_mlflow/io/models/mlflow_model_r │
│ egistry_dataset.py:98 in _load │
│ │
│ 95 │ │ # because the same run can be registered under several different names │
│ 96 │ │ # in the registry. See <https://github.com/Galileo-Galilei/kedro-mlflow/issues/5> │
│ 97 │ │ import pdb; pdb.set_trace() │
│ ❱ 98 │ │ <http://self._logger.info|self._logger.info>(f"Loading model from run_id='{model.metadata.run_id}'") │
│ 99 │ │ return model │
│ 100 │ │
│ 101 │ def _save(self, model: Any) -> None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'SklearnPipeline' object has no attribute 'metadata'
DatasetError: Failed while loading data from dataset
kedro_mlflow.io.models.mlflow_model_registry_dataset.MlflowModelRegistryDataset(model_uri='models:/mill1_west_no_we
nco_st_model@champion', model_name='mill1_west_no_wenco_st_model', alias='champion', flavor='mlflow.sklearn',
pyfunc_workflow='python_model').
'SklearnPipeline' object has no attribute 'metadata'
I think that the MlflowModelRegistryDataset class wasn't expecting the model to be a sklearn object. Probably there's a difference in how I'm saving the model (MlflowModelTrackingDataset) and how I'm loading it (MlflowModelRegistryDataset).
How I could load the champion model?
@Rashida Kanchwala @Ravi Kumar Pillajeffrey
08/12/2025, 4:46 PMSen
08/13/2025, 2:46 AMJamal Sealiti
08/19/2025, 10:40 AMspark.submit.deployMode = "cluster"
• spark.master = "yarn"
My goal is to run this setup within a datafabric. However, I came across a discussion online stating that Kedro internally uses the PySpark shell to instantiate the SparkSession
, which is incompatible with YARN's cluster deploy mode. As cluster mode requires spark-submit
rather than interactive shells, this presents a challenge.
A suggested workaround involves:
• Packaging the Kedro project as a Python wheel (.whl
) or zip archive.
• Using spark-submit
to deploy the packaged project to the cluster.
But this workround maybe avoiding dependency issues...
Do you have any recommendations or best practices for this deployment approach? Is there a more streamlined way to integrate Kedro with Spark in cluster mode within a datafabric context?Arnaud Dhaene
08/25/2025, 4:34 PMpython
entrypoint when setting up a workflow-type job
Is there an elegant / intuitive way to run my Kedro project from the command-line using python -m <something> run <pipeline> ...
? Perhaps there is a way to bootstrap Kedro in a light-weight wrapper?Fazil Topal
08/26/2025, 1:52 PM# To enable this custom logging configuration, set KEDRO_LOGGING_CONFIG to the path of this file.
# More information available at <https://docs.kedro.org/en/stable/logging/logging.html>
version: 1
disable_existing_loggers: False
formatters:
simple:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: simple
stream: <ext://sys.stdout>
info_file_handler:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: info.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True
rich:
class: kedro.logging.RichHandler
rich_tracebacks: True
# Advance options for customisation.
# See <https://docs.kedro.org/en/stable/logging/logging.html#project-side-logging-configuration>
# tracebacks_show_locals: False
loggers:
kedro:
level: INFO
text2shots:
level: INFO
root:
handlers: [rich]
According to documentation, unless i define the KEDRO_LOGGING_CONFIG
the default will be used (which points to here: https://github.com/kedro-org/kedro/blob/main/kedro/framework/project/default_logging.yml)
1- When i run kedro, i see the logging saying it will use my file by default (it picks it up automatically which is fine)
2- When my code fails, i can't see the tracebacks properly.
After some hours spent, i found the issue (which is the full traceback):
File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 103, in convert_messages
converted_messages = [_convert_single_message(msg) for msg in messages]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 103, in <listcomp>
converted_messages = [_convert_single_message(msg) for msg in messages]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ftopal/Projects/text2shots/.venv/lib/python3.11/site-packages/lmapis/providers/anthropic.py", line 65, in _convert_single_message
for tool_call in msg["tool_calls"]:
TypeError: 'NoneType' object is not iterable
but kedro only shows the last part to me TypeError: 'NoneType' object is not iterable
and does not even mention the file/line number so it's incredibly hard for me to understand where this is coming from.
I am using kedro version: 0.19.12.
How can i enable this so i get full error tracebacks without losing them?Jean Plumail
08/26/2025, 4:04 PMGalen Seilis
08/26/2025, 9:04 PMPascal Brokmeier
08/27/2025, 6:47 AMGauthier Pierard
09/01/2025, 3:47 PMPaul Haakma
09/01/2025, 9:05 PMPaul Haakma
09/02/2025, 5:31 AMNikola Miszalska
09/03/2025, 8:22 AMLeonardo David Treiger Herszenhaut Brettas
09/08/2025, 4:00 AMVíctor Alejandro Hernández Martínez
09/09/2025, 7:24 PMLaure Vancau
09/11/2025, 1:41 PMspark.jars.packages: org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.4,org.apache.spark:spark-token-provider-kafka-0-10_2.12:3.2.4
)
From what i have understood, my dataset definition in the catalogue should be something like this:
data:
type: spark.SparkStreamingDataSet
file_format: kafka
load_args:
options:
subscribe: my-topic
kafka.bootstrap.servers: kafka:0000
startingOffsets: earliest
however, I cannot navigate around the error :
DatasetError: Failed while loading data from data set
SparkStreamingDataset(file_format=kafka, filepath=., load_args={'options':
{'kafka.bootstrap.servers': kafka:0000, 'startingOffsets': earliest,
'subscribe': my-topic}}, save_args={}).
schema should be StructType or string
Would you have any example projects or extra docs to point me to ?
Thanks a bunch 😊