Vishal Pandey
09/05/2024, 11:49 AMtime="2024-09-05T11:37:29.010Z" level=info msg="capturing logs" argo=true
cp: cannot stat '/home/kedro/data/*': No such file or directory
time="2024-09-05T11:37:30.011Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 1
@Artur Dobrogowski Can you helpVishal Pandey
09/10/2024, 3:34 PMMark Druffel
09/13/2024, 6:44 PMInvalid Input Error: Could not set option "schema" as a global option
.
bronze_x:
type: ibis.TableDataset
filepath: x.csv
file_format: csv
table_name: x
backend: duckdb
database: data.duckdb
schema: bronze
I can reproduce this error with vanilla ibis:
con = ibis.duckdb.connect(database="data.duckdb", schema = "bronze")
Found a related question on ibis' github, it sounds like duckdb can't set the schema globally so it has to be done in the table functions. Wondering if this would require a change to ibis.TableDataset, and if so, would this pattern work the same with other backends?Deepyaman Datta
09/16/2024, 12:53 PMpandera.io.deserialize_schema
under the hood in it's schema resolver, and that seems to be only implemented in pandera for pandas, is that right?Vishal Pandey
09/18/2024, 4:59 PMLĂvia Pimentel
09/19/2024, 3:30 PMVishal Pandey
09/25/2024, 8:47 AMvolume:
# Storage class - use null (or no value) to use the default storage
# class deployed on the Kubernetes cluster
storageclass: # default
# The size of the volume that is created. Applicable for some storage
# classes
size: 1Gi
# Access mode of the volume used to exchange data. ReadWriteMany is
# preferred, but it is not supported on some environements (like GKE)
# Default value: ReadWriteOnce
#access_modes: [ReadWriteMany]
# Flag indicating if the data-volume-init step (copying raw data to the
# fresh volume) should be skipped
skip_init: False
# Allows to specify user executing pipelines within containers
# Default: root user (to avoid issues with volumes in GKE)
owner: 0
# Flak indicating if volume for inter-node data exchange should be
# kept after the pipeline is deleted
keep: False
2.
# Optional section to allow mounting additional volumes (such as EmptyDir)
# to specific nodes
extra_volumes:
tensorflow_step:
- mount_path: /dev/shm
volume:
name: shared_memory
empty_dir:
cls: V1EmptyDirVolumeSource
params:
medium: Memory
Vishal Pandey
09/26/2024, 8:07 AM--env , --nodes , -- pipelines
which we pass using the kedro run
command .
So for any given plugin related to deployments like airflow , kubeflow . How can we supply these arguments ?George p
10/03/2024, 11:53 PMAlexandre Ouellet
10/15/2024, 5:17 PMThiago José Moser Poletto
10/17/2024, 5:25 PMMark Druffel
10/18/2024, 7:38 PMraw_tracks:
type: ibis.TableDataset
table_name: raw_tracks
connection:
backend: pyspark
database: comms_media_dev.dart_extensions
def load(self) -> ir.Table:
return self.connection.table(self._table_name)
I think updating load() seems fairly simple, something like the code below works, but was the initial intent that we could pass a catalog / database through the config here? If yes on the latter I think perhaps I'm not using the spark config properly or databricks is doing something strange... posted a question about that here for context.
def load(self) -> ir.Table:
return self.connection.table(name = self._table_name, database = self._database)
Thabo Mphuthi
11/20/2024, 5:49 AMNok Lam Chan
11/27/2024, 6:35 AMHimanshu Sharma
12/12/2024, 10:16 AMFailed to execute command group with error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` failed with status code `1` and it was not possible to extract the structured error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` exited with code 1 due to error None and we couldn't read the error due to GetErrorFromContainerFailed { last_stderr: Some("exec /mnt/azureml/cr/j/0341a555koec4794bb36cf074f0386h/cap/lifecycler/wd/execution-wrapper: no such file or directory\n") }.
Pipeline screenshot from Azure ML:Guillaume Tauzin
02/10/2025, 4:45 PMPhilipp Dahlke
02/13/2025, 11:03 AMkedro_mlflow.io.artifacts.MlflowArtifactDataset
I followed the instructions for building the container from kedro-docker repo but when running, those artifacts want to access my local windows path instead of the containers path. Do you guys know what additional settings I have to make? All my settings in are pretty much vanilla. The mlflow_tracking_uri
is set to null
"{dataset}.team_lexicon":
type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
dataset:
type: pandas.ParquetDataset
filepath: data/03_primary/{dataset}/team_lexicon.pq
metadata:
kedro-viz:
layer: primary
preview_args:
nrows: 5
Traceback (most recent call last):
kedro.io.core.DatasetError: Failed while saving data to dataset MlflowParquetDataset(filepath=/home/kedro_docker/data/03_primary/D1-24-25/team_lexicon.pq, load_args={}, protocol=file, save_args={}).
[Errno 13] Permission denied: '/C:'
Bibo Bobo
02/16/2025, 12:18 PMlog_table
method in kedro-mlflow. So I wonder what will be the right way to log additional data from a node, something that is not yet supported by the plugin?
Right now I just do something like this at the end of the node function
mlflow.log_table(data_for_table, output_filename)
But I am concerned as I am not sure if it will always work and will always log the data to the correct run because I was not able to get retrieve the active run id from inside the node with mlflow.active_run()
(it returns None
all the time).
I need this because I want to use the Evaluation
tab in the UI to manually compare some outputs of different runs.Yifan
02/20/2025, 2:33 PMkedro-mlflow 0.14.3
specific to python 3.9
. It seems that a fix is already merged in the repo. When would the fix be released? Thank!Ian Whalen
02/25/2025, 3:38 PMJuan Luis
02/25/2025, 4:58 PMJuan Luis
03/11/2025, 4:43 PMkedro-azureml
0.9.0 and kedro-vertexai
0.12.0 with support for the most recent Kedro and Python versions. you can thank GetInData for it 👏🏼Merel
03/26/2025, 10:39 AM0.19.12
and the changes we did to the databricks starter (https://github.com/kedro-org/kedro-starters/pull/267) might have broken the resource creation for the kedro-databricks
plugin @Jens Peder Meldgaard. When I do kedro databricks bundle
the resources folder gets created, but it's empty. (cc: @Sajid Alam)Merel
03/27/2025, 8:31 AMkedro-databricks
works and I was wondering whether it makes sense to use any of the other runners (ThreadRunner
or ParallelRunner
)? As far as I understand for every node we use these run parameters --nodes name, --conf-source self.remote_conf_dir, --env self.env
. Would it make sense to allow for adding runner type too? Or if you want parallel running you should use the databricks cluster setup for that? I'm not very familiar with all the run options in Databricks, so trying to figure out where to use Kedro features and where Databricks. (cc: @Rashida Kanchwala)Yury Fedotov
05/28/2025, 7:47 PMkedro-mlflow
support custom model flavors in datasets? I'm reading in docs that yes, but wanted to double check that this is relevant. @Yolan Honoré-RougéYolan Honoré-Rougé
05/28/2025, 8:30 PMJens Peder Meldgaard
07/02/2025, 6:37 AMkedro-databricks
and I am uncertain of how to resolve it - anyone who can help me figure out what to do here? 🙏
https://github.com/JenspederM/kedro-databricks/issues/135
A bit of explanation:
The issue occurs when using namespaces
for pipelines, as it prepends the namespace
to any input and output resulting in, e.g., ValueError: Pipeline input(s) {'active_modelling_pipeline.X_train', 'active_modelling_pipeline.y_train'} not found in the DataCatalog
when using a namespace
called active_modelling_pipeline
.
When nodes are executed in Databricks, each node is executed in a workflow task with a command similar to kedro run --nodes <node-name> --conf-source <some-path> --env <some-env>
. Do I need to add the --namespace <some-namespace>
option to the invocation to get it to correctly resolve the catalog paths?Yury Fedotov
07/30/2025, 1:10 PMkedro-pandera
support 1.0? @Yolan Honoré-Rougé @Nok Lam ChanMax Pardoe
07/31/2025, 11:07 AMSIMON TAMAYO
08/27/2025, 3:02 PM