Mark Druffel
09/13/2024, 6:44 PMInvalid Input Error: Could not set option "schema" as a global option .
bronze_x:
type: ibis.TableDataset
filepath: x.csv
file_format: csv
table_name: x
backend: duckdb
database: data.duckdb
schema: bronze
I can reproduce this error with vanilla ibis:
con = ibis.duckdb.connect(database="data.duckdb", schema = "bronze")
Found a related question on ibis' github, it sounds like duckdb can't set the schema globally so it has to be done in the table functions. Wondering if this would require a change to ibis.TableDataset, and if so, would this pattern work the same with other backends?Deepyaman Datta
09/16/2024, 12:53 PMpandera.io.deserialize_schema under the hood in it's schema resolver, and that seems to be only implemented in pandera for pandas, is that right?Vishal Pandey
09/18/2024, 4:59 PMLĂvia Pimentel
09/19/2024, 3:30 PMVishal Pandey
09/25/2024, 8:47 AMvolume:
# Storage class - use null (or no value) to use the default storage
# class deployed on the Kubernetes cluster
storageclass: # default
# The size of the volume that is created. Applicable for some storage
# classes
size: 1Gi
# Access mode of the volume used to exchange data. ReadWriteMany is
# preferred, but it is not supported on some environements (like GKE)
# Default value: ReadWriteOnce
#access_modes: [ReadWriteMany]
# Flag indicating if the data-volume-init step (copying raw data to the
# fresh volume) should be skipped
skip_init: False
# Allows to specify user executing pipelines within containers
# Default: root user (to avoid issues with volumes in GKE)
owner: 0
# Flak indicating if volume for inter-node data exchange should be
# kept after the pipeline is deleted
keep: False
2.
# Optional section to allow mounting additional volumes (such as EmptyDir)
# to specific nodes
extra_volumes:
tensorflow_step:
- mount_path: /dev/shm
volume:
name: shared_memory
empty_dir:
cls: V1EmptyDirVolumeSource
params:
medium: MemoryVishal Pandey
09/26/2024, 8:07 AM--env , --nodes , -- pipelines which we pass using the kedro run command .
So for any given plugin related to deployments like airflow , kubeflow . How can we supply these arguments ?George p
10/03/2024, 11:53 PMAlexandre Ouellet
10/15/2024, 5:17 PMThiago José Moser Poletto
10/17/2024, 5:25 PMMark Druffel
10/18/2024, 7:38 PMraw_tracks:
type: ibis.TableDataset
table_name: raw_tracks
connection:
backend: pyspark
database: comms_media_dev.dart_extensions
def load(self) -> ir.Table:
return self.connection.table(self._table_name)
I think updating load() seems fairly simple, something like the code below works, but was the initial intent that we could pass a catalog / database through the config here? If yes on the latter I think perhaps I'm not using the spark config properly or databricks is doing something strange... posted a question about that here for context.
def load(self) -> ir.Table:
return self.connection.table(name = self._table_name, database = self._database)Thabo Mphuthi
11/20/2024, 5:49 AMNok Lam Chan
11/27/2024, 6:35 AMHimanshu Sharma
12/12/2024, 10:16 AMFailed to execute command group with error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` failed with status code `1` and it was not possible to extract the structured error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` exited with code 1 due to error None and we couldn't read the error due to GetErrorFromContainerFailed { last_stderr: Some("exec /mnt/azureml/cr/j/0341a555koec4794bb36cf074f0386h/cap/lifecycler/wd/execution-wrapper: no such file or directory\n") }.
Pipeline screenshot from Azure ML:Guillaume Tauzin
02/10/2025, 4:45 PMPhilipp Dahlke
02/13/2025, 11:03 AMkedro_mlflow.io.artifacts.MlflowArtifactDataset
I followed the instructions for building the container from kedro-docker repo but when running, those artifacts want to access my local windows path instead of the containers path. Do you guys know what additional settings I have to make? All my settings in are pretty much vanilla. The mlflow_tracking_uri is set to null
"{dataset}.team_lexicon":
type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
dataset:
type: pandas.ParquetDataset
filepath: data/03_primary/{dataset}/team_lexicon.pq
metadata:
kedro-viz:
layer: primary
preview_args:
nrows: 5
Traceback (most recent call last):
kedro.io.core.DatasetError: Failed while saving data to dataset MlflowParquetDataset(filepath=/home/kedro_docker/data/03_primary/D1-24-25/team_lexicon.pq, load_args={}, protocol=file, save_args={}).
[Errno 13] Permission denied: '/C:'Bibo Bobo
02/16/2025, 12:18 PMlog_table method in kedro-mlflow. So I wonder what will be the right way to log additional data from a node, something that is not yet supported by the plugin?
Right now I just do something like this at the end of the node function
mlflow.log_table(data_for_table, output_filename)
But I am concerned as I am not sure if it will always work and will always log the data to the correct run because I was not able to get retrieve the active run id from inside the node with mlflow.active_run() (it returns None all the time).
I need this because I want to use the Evaluation tab in the UI to manually compare some outputs of different runs.Yifan
02/20/2025, 2:33 PMkedro-mlflow 0.14.3 specific to python 3.9 . It seems that a fix is already merged in the repo. When would the fix be released? Thank!Ian Whalen
02/25/2025, 3:38 PMJuan Luis
02/25/2025, 4:58 PMJuan Luis
03/11/2025, 4:43 PMkedro-azureml 0.9.0 and kedro-vertexai 0.12.0 with support for the most recent Kedro and Python versions. you can thank GetInData for it 👏🏼Merel
03/26/2025, 10:39 AM0.19.12 and the changes we did to the databricks starter (https://github.com/kedro-org/kedro-starters/pull/267) might have broken the resource creation for the kedro-databricks plugin @Jens Peder Meldgaard. When I do kedro databricks bundle the resources folder gets created, but it's empty. (cc: @Sajid Alam)Merel
03/27/2025, 8:31 AMkedro-databricks works and I was wondering whether it makes sense to use any of the other runners (ThreadRunner or ParallelRunner)? As far as I understand for every node we use these run parameters --nodes name, --conf-source self.remote_conf_dir, --env self.env. Would it make sense to allow for adding runner type too? Or if you want parallel running you should use the databricks cluster setup for that? I'm not very familiar with all the run options in Databricks, so trying to figure out where to use Kedro features and where Databricks. (cc: @Rashida Kanchwala)Yury Fedotov
05/28/2025, 7:47 PMkedro-mlflow support custom model flavors in datasets? I'm reading in docs that yes, but wanted to double check that this is relevant. @Yolan Honoré-RougéYolan Honoré-Rougé
05/28/2025, 8:30 PMJens Peder Meldgaard
07/02/2025, 6:37 AMkedro-databricks and I am uncertain of how to resolve it - anyone who can help me figure out what to do here? 🙏
https://github.com/JenspederM/kedro-databricks/issues/135
A bit of explanation:
The issue occurs when using namespaces for pipelines, as it prepends the namespace to any input and output resulting in, e.g., ValueError: Pipeline input(s) {'active_modelling_pipeline.X_train', 'active_modelling_pipeline.y_train'} not found in the DataCatalog when using a namespace called active_modelling_pipeline .
When nodes are executed in Databricks, each node is executed in a workflow task with a command similar to kedro run --nodes <node-name> --conf-source <some-path> --env <some-env> . Do I need to add the --namespace <some-namespace> option to the invocation to get it to correctly resolve the catalog paths?Yury Fedotov
07/30/2025, 1:10 PMkedro-pandera support 1.0? @Yolan Honoré-Rougé @Nok Lam ChanMax Pardoe
07/31/2025, 11:07 AMSIMON TAMAYO
08/27/2025, 3:02 PMGuillaume Tauzin
10/31/2025, 4:44 PMname: Publish and share Kedro Viz
permissions:
pages: write
id-token: write
on:
pull_request:
push:
branches:
- main
workflow_dispatch:
jobs:
deploy-viz:
name: Deploy Kedro Viz
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v4
with:
enable-cache: true
cache-dependency-glob: "pyproject.toml"
github-token: ${{ secrets.GITHUB_TOKEN }}
python-version: "3.13"
- name: Sync uv
run: uv sync --group viz
- name: Deploy Kedro-Viz to GH Pages
uses: kedro-org/publish-kedro-viz@v2
Thanks a lot for your help :)Guillaume Tauzin
11/04/2025, 7:16 AMdefinitions.py and use a dagster.yml in your config to specify jobs, schedules, executors etc.
- Preserves your Kedro hooks intact. In particular, it works seamlessly with kedro-mlflow.
- Experimentally supports Dagster partitions by fanning-out Kedro nodes acting on partitioned datasets.
- The example repo is a full-blown small project showing how it can be wired up. It makes use of dynamic pipelines and also showcases distributed hyperparameter tuning using a new `optuna.StudyDataset` experimental dataset.
Get started
- Docs: https://kedro-dagster.readthedocs.io/
- Plugin repo: https://github.com/gtauzin/kedro-dagster
- Example repo: https://github.com/gtauzin/kedro-dagster-example
- Kedro-Viz of the example repo: https://gtauzin.github.io/kedro-dagster-example/
I would love your help & feedback
It would mean a lot if you could:
* Try it out in one of your Kedro projects
* Spot issues, missing bits or docs gaps
* Share how you would use it, or ideas for features/improvements
* Reach out if you would like to contribute!
I am looking forward to hearing what you think and how you might use it! :)