Biel Stela
11/11/2025, 10:27 AMgdal , which is a CLI for a c++ lib (the one used under the hood by rasterio), that can handle the large files without problem because it does all the streaming and all sorts of nice things under the hood. So I want to integrate this processing in my existing pipeline. Is it a bad idea to have a custom dataset that calls an external program via subprocess or something similar ? have you ever seen a pattern like this before? Will God kill a kitten if I go with this approach?
Thank you!Shah
11/11/2025, 3:33 PMLinkageError occurred while loading main class org.apache.spark.launcher.Main java.lang.UnsupportedClassVersionError:
A little google search told me it's not finding the java installation. To resolve, I installed the latest java (jdk25). Now, the error has changed to:
Py4JJavaError: An error occurred while calling <http://None.org|None.org>.apache.spark.api.java.JavaSparkContext. : java.lang.UnsupportedOperationException: getSubject is not supported
I have checked the java path, and it's pointing to /usr/lib/jvm/java-11-openjdk-amd64/ despite explicitly mentioning /usr/lib/jvm/jdk-25.0.1-oracle-x64/bin in the environment.
But I think the main issue is, it seems, with pyspark which is not launching, throwing the same error.
Since I do not need pyspark in this project, is there a way to disable it for time being, just to test my pipeline? Or else, how else could I fix this?
Thanks!Ralf Kowatsch
11/13/2025, 8:12 AMSrinivas
11/14/2025, 8:19 AMwith KedroSession.create(project_path=project_path,package_name="package", env="end") as session:
session.run(node_names=["ds1"
])
and the connection details are like this
ds1:
type: "${globals:datatypes.csv}"
filepath: "abfss://<container>@<acount_name>.<http://dfs.core.windows.net/raw_data/ds1.csv.gz|dfs.core.windows.net/raw_data/ds1.csv.gz>"
fs_args:
account_name: "accountName"
sas_token: "sas_token"
layer: raw_data
load_args:
sep: ";"
escapechar: "\\"
encoding: "utf-8"
compression: gzip
#lineterminator: "\n"
usecols:
The token is fine, but I am getting this exception
DatasetError: Failed while loading data from data set CSVDataset(filepath=, load_args={}, protocol=abfss, save_args={'index': False}). Operation returned an invalid status 'Server failed to authenticate the request. Please refer to the information in the www-authenticate header.' ErrorCode:NoAuthenticationInformationSrinivas
11/14/2025, 8:19 AMAyushi
11/14/2025, 12:29 PMcyril verluise
11/17/2025, 7:09 PMDatasetError: An exception occurred when parsing config for dataset 'summary':
No module named 'tracking'. Please install the missing dependencies for
tracking.MetricsDataset:
<https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#install-d>
ependencies-related-to-the-data-catalog
Hint: If you are trying to use a dataset from `kedro-datasets`, make sure that
the package is installed in your current environment. You can do so by running
`pip install kedro-datasets` or `pip install kedro-datasets[<dataset-group>]` to
install `kedro-datasets` along with related dependencies for the specific
dataset group.
Any idea of what is happening?Fabian P
11/19/2025, 12:50 PMLayer.call must always be passed.'), <traceback object at 0x0000025E444A4540>)
When debugging, i can save each model individually by model.save(), so i assume the error message is not truly valid.galenseilis
11/19/2025, 10:30 PMYufei Zheng
11/20/2025, 5:35 PMkedro package and pass these to spark executor, thanks! (Tried to run the package command but still hitting no module named xxx in spark executor)Ming Fang
11/21/2025, 12:22 AMuvx kedro new --starter spaceflights-pandas --name spaceflights
cd spaceflights
But the next command
uv run kedro run --pipeline __default__
resulted in these errors
[11/21/25 00:21:49] INFO Using 'conf/logging.yml' as logging configuration. You can change this by setting the KEDRO_LOGGING_CONFIG environment variable accordingly. __init__.py:270
INFO Kedro project spaceflights session.py:330
[11/21/25 00:21:51] INFO Kedro is sending anonymous usage data with the sole purpose of improving the product. No personal data or IP addresses are stored on our side. To opt plugin.py:243
out, set the `KEDRO_DISABLE_TELEMETRY` or `DO_NOT_TRACK` environment variables, or create a `.telemetry` file in the current working directory with the
contents `consent: false`. To hide this message, explicitly grant or deny consent. Read more at
<https://docs.kedro.org/en/stable/configuration/telemetry.html>
WARNING Workflow tracking is disabled during partial pipeline runs (executed using --from-nodes, --to-nodes, --tags, --pipeline, and more). run_hooks.py:135
`.viz/kedro_pipeline_events.json` will be created only during a full kedro run. See issue <https://github.com/kedro-org/kedro-viz/issues/2443> for
more details.
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ /home/coder/spaceflights/.venv/lib/python3.13/site-packages/kedro/io/core.py:187 in from_config โ
โ โ
โ 184 โ โ โ
โ 185 โ โ """ โ
โ 186 โ โ try: โ
โ โฑ 187 โ โ โ class_obj, config = parse_dataset_definition( โ
โ 188 โ โ โ โ config, load_version, save_version โ
โ 189 โ โ โ ) โ
โ 190 โ โ except Exception as exc: โ
โ โ
โ /home/coder/spaceflights/.venv/lib/python3.13/site-packages/kedro/io/core.py:578 in โ
โ parse_dataset_definition โ
โ โ
โ 575 โ โ โ โ "related dependencies for the specific dataset group." โ
โ 576 โ โ โ ) โ
โ 577 โ โ โ default_error_msg = f"Class '{dataset_type}' not found, is this a typo?" โ
โ โฑ 578 โ โ โ raise DatasetError(f"{error_msg if error_msg else default_error_msg}{hint}") โ
โ 579 โ โ
โ 580 โ if not class_obj: โ
โ 581 โ โ class_obj = dataset_type โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
DatasetError: Dataset 'MatplotlibWriter' not found in 'matplotlib'. Make sure the dataset name is correct.
Hint: If you are trying to use a dataset from `kedro-datasets`, make sure that the package is installed in your current environment. You can do so by running `pip install kedro-datasets` or `pip
install kedro-datasets[<dataset-group>]` to install `kedro-datasets` along with related dependencies for the specific dataset group.Jan
11/21/2025, 9:33 AMPrachee Choudhury
11/22/2025, 3:44 AMAhmed Etefy
11/22/2025, 8:58 PMBasem Khalaf
11/22/2025, 10:26 PMAhmed Etefy
11/23/2025, 9:07 PMGauthier Pierard
11/24/2025, 1:48 PMafter_context_created hook called AzureSecretsHook that saves some credentials in context . Can I use these credentials as node inputs?
context.config_loader["credentials"] = {
**context.config_loader["credentials"],
**adls_creds,
}
self.credentials = context.config_loader["credentials"]
so far only been able to use it by importing AzureSecretsHook and using AzureSecretsHook.get_creds() directly in the nodes
@staticmethod
def get_creds():
return AzureSecretsHook.credentialsJonghyun Yun
11/25/2025, 4:31 PMGauthier Pierard
11/26/2025, 10:03 AMAbstractDataset predefined currently for polars to delta table?
would something like this do the job?
class PolarsDeltaDataset(AbstractDataset):
def __init__(self, filepath: str, mode: str = "append"):
self.filepath = filepath
self.mode = mode
def _load(self) -> pl.DataFrame:
return pl.read_delta(self.filepath)
def _save(self, data: pl.DataFrame) -> None:
write_deltalake(
self.filepath,
data,
mode=self.mode
)
def _describe(self):
return dict(
filepath=self.filepath,
mode=self.mode
)Martin van Hensbergen
11/27/2025, 10:56 AMMemoryDataset as input for the inference pipeline but I get "`DatasetError: Data for MemoryDataset has not been saved`" error when running:
with KedroSession.create() as session:
context = session.load_context()
context.catalog.get("input").save("mydata")
session.run(pipeline_name="inference")
1. Is this the proper way to do it?
2. Is this a use case that is supported by Kedro or should I only use it for the batch training and use the output of those models manually in my service.Zubin Roy
11/28/2025, 12:04 PMtimestamp = datetime.utcnow().strftime("%Y-%m-%dT%H-%M-%S")
return {
f"{timestamp}/national_ftds_ftus_ratio_df": national_ftds_ftus_ratio_df,
f"{timestamp}/future_ftds_predictions_by_month_df": future_ftds_predictions_by_month_df,
...
}
And my catalog entry is:
forecast_outputs:
type: partitions.PartitionedDataset
dataset: pandas.CSVDataset
path: s3://.../forecast/
filename_suffix: ".csv"
This works, but Iโm not sure if Iโm using PartitionedDataset in the most โKedro-nativeโ way or if thereโs a better supported pattern for grouping multiple outputs under a single version.
Itโs a minor problem, but Iโd love to hear any thoughts, best practices, or alternative approaches. Thanks!Lรญvia Pimentel
12/02/2025, 12:08 AM--params at runtime, but Kedro isnโt picking it up.
In my parameters.yml I have:
data_ingestion:
queries:
queries_folder: "${runtime_params:folder}"
Then, in the pipeline creation:
conf_path = str(settings.CONF_SOURCE)
conf_loader = OmegaConfigLoader(conf_source=conf_path)
params = conf_loader["parameters"]
queries_folder = params["data_ingestion"]["queries"]["queries_folder"]
query_files = [f for f in os.listdir(queries_folder) if f.endswith(".sql")]
When I run:
kedro run -p data_ingestion_s3 --params=folder=custom_folder
I get an error saying "folder " not found, and no default value provided.
Has anyone used runtime parameters inside parameter files like this?
Do you know if this is expected, or should I be loading params differently?
I would appreciate any guidance you could give me! Thanks ๐
Note: I am using kedro version 1.0.0Jon Cohen
12/02/2025, 1:35 AMNAYAN JAIN
12/02/2025, 3:09 PMkedro viz or kedro viz build when your catalog expects runtime parameters? I am not able to use these commands without manually deleting the catalog files.
Are there any plans to support --conf-source or --params in the kedro viz command?Matthias Roels
12/02/2025, 4:39 PMAnna-Lea
12/03/2025, 2:25 PMPartitionedDataset as input and PartitionDataset as output. So something like this:
def my_node(inputs: dict[str, Callable[[], Any]]) -> dict[str, Any]:
results = {}
for key, value in inputs.items():
response = my_function(value())
results[key] = response
return results
Ideally, I would want:
โข the internal for loop to run in parallel
I've noticed that @Guillaume Tauzin mentioned a similar situationolver-dev
12/07/2025, 12:30 PMRalf Kowatsch
12/08/2025, 4:07 PMmarrrcin
12/08/2025, 8:17 PMMarcus Warnerfjord
12/09/2025, 11:43 AM