https://kedro.org/ logo
Join Slack
Powered by
# plugins-integrations
  • v

    Vishal Pandey

    09/05/2024, 11:49 AM
    Hello everyone I am trying to run a toy kedro pipeline on kubeflow. So i build a docker image and executed the pipeline in the container locally and it runs fine. I have published the pipeline on kubeflow as well. But when I execute the pipeline on kubeflow I am getting an error
    time="2024-09-05T11:37:29.010Z" level=info msg="capturing logs" argo=true
    cp: cannot stat '/home/kedro/data/*': No such file or directory
    time="2024-09-05T11:37:30.011Z" level=info msg="sub-process exited" argo=true error="<nil>"
    Error: exit status 1
    @Artur Dobrogowski Can you help
    a
    • 2
    • 3
  • v

    Vishal Pandey

    09/10/2024, 3:34 PM
    @Artur Dobrogowski Do we we have any python sdk for kubeflow plugin. We are looking for a pythonic way to use the functionalities instead of the CLI being offered.
    a
    e
    • 3
    • 5
  • m

    Mark Druffel

    09/13/2024, 6:44 PM
    Has anyone used ibis.TableDataset with duckdb schemas? If I set a schema on a data catalog entry I get the error
    Invalid Input Error: Could not set option "schema" as a global option
    .
    Copy code
    bronze_x:
      type: ibis.TableDataset
      filepath: x.csv
      file_format: csv
      table_name: x
      backend: duckdb
      database: data.duckdb
      schema: bronze
    I can reproduce this error with vanilla ibis:
    Copy code
    con = ibis.duckdb.connect(database="data.duckdb", schema = "bronze")
    Found a related question on ibis' github, it sounds like duckdb can't set the schema globally so it has to be done in the table functions. Wondering if this would require a change to ibis.TableDataset, and if so, would this pattern work the same with other backends?
    d
    • 2
    • 3
  • d

    Deepyaman Datta

    09/16/2024, 12:53 PM
    Am I correct in understanding that Kedro-Pandera will only work with pandas schemas currently? I saw that it uses
    pandera.io.deserialize_schema
    under the hood in it's schema resolver, and that seems to be only implemented in pandera for pandas, is that right?
    l
    y
    n
    • 4
    • 10
  • v

    Vishal Pandey

    09/18/2024, 4:59 PM
    Hello everyone This is regarding the kubeflow plugin. I wanted to just gain some information about how kedro nodes are executed by kubeflow. Does kubeflow run each node in a separate container ?? or separate pods ?? or all of nodes are executed in the same container
    a
    • 2
    • 95
  • l

    LĂ­via Pimentel

    09/19/2024, 3:30 PM
    Hi, everyone. Can someone confirm if its possible to use kedro-azureml with kedro>=0.19? From what I see here it's not, but wanted to confirm it if mabe the website is outdated
    d
    m
    +2
    • 5
    • 8
  • v

    Vishal Pandey

    09/25/2024, 8:47 AM
    Hey Folks I am looking for a way to mount AWS EFS volume to my kedro pipeline which will be executed by kubeflow . I am using the kubeflow plugin. The config has below 2 options for Volumes , I am not sure which one is for what purpose 1.
    Copy code
    volume:
    
        # Storage class - use null (or no value) to use the default storage
        # class deployed on the Kubernetes cluster
        storageclass: # default
    
        # The size of the volume that is created. Applicable for some storage
        # classes
        size: 1Gi
    
        # Access mode of the volume used to exchange data. ReadWriteMany is
        # preferred, but it is not supported on some environements (like GKE)
        # Default value: ReadWriteOnce
        #access_modes: [ReadWriteMany]
    
        # Flag indicating if the data-volume-init step (copying raw data to the
        # fresh volume) should be skipped
        skip_init: False
    
        # Allows to specify user executing pipelines within containers
        # Default: root user (to avoid issues with volumes in GKE)
        owner: 0
    
        # Flak indicating if volume for inter-node data exchange should be
        # kept after the pipeline is deleted
        keep: False
    2.
    Copy code
    # Optional section to allow mounting additional volumes (such as EmptyDir)
      # to specific nodes
      extra_volumes:
        tensorflow_step:
        - mount_path: /dev/shm
          volume:
            name: shared_memory
            empty_dir:
              cls: V1EmptyDirVolumeSource
              params:
                medium: Memory
    m
    n
    • 3
    • 8
  • v

    Vishal Pandey

    09/26/2024, 8:07 AM
    Hey Everyone I wanted to know more about kedro CLI that we have . So there are arguments like
    --env , --nodes , -- pipelines
    which we pass using the
    kedro run
    command . So for any given plugin related to deployments like airflow , kubeflow . How can we supply these arguments ?
    m
    n
    • 3
    • 4
  • g

    George p

    10/03/2024, 11:53 PM
    Hey all. I recently stumbled across "React Flow" (link) while searching for an open source graph & node drag-and-drop solution. Specifically, I am looking to create a UI for my team, which would allow for easy no-code pipeline creation, by presenting a list of available nodes (on the side, as "prewritten" functions) which can then be dragged-dropped-and-connected with the rest of the pipeline. I am unsure if something like this would... 'play nice' with kedro-viz (now and/or in the future), but is there anyone who has thought about this before? If so, what did you do about it (ideally in combination with kedro/kedro-viz)? [I have posted 2 relevant links below]
    đź‘€ 1
    j
    r
    d
    • 4
    • 7
  • a

    Alexandre Ouellet

    10/15/2024, 5:17 PM
    Hey there! Quick question about kedro-azureml. We are using AzureML, and we'd like to use AzureMLAssetDataset with dataset factories. After a lot of headach and debugging, it seems impossible to use both, as the way credentials are passed to the AzureMLAssetDataset is done through a hook (after_catalog_created), but the issue is that if you use a dataset_patterns (as in, declare your dataset as "{name}.csv" or something similar), the hook is called, but the patterned dataset is not instanciated yet. After all that, a before_node_run is called, and then there is a AzureMLAssetDataset._load() called, but the AzureMLAssetDataset.azure_config setter hasn't been called yet (as it is called only in the after_catalog_created hook). At first glance, it seems like a kedro-azureml issue, as AzureMLAssetDataset._load() can be called without the setter being called when used as a dataset factory. But also, it might be a kedro issue, as I think there should be an obvious way to setup credentials in that specific scenario, and I don't quite see it from the docs on hook either
    j
    r
    e
    • 4
    • 15
  • t

    Thiago José Moser Poletto

    10/17/2024, 5:25 PM
    Hey guys I would like to know if theres anyone that have tested the Kedro Vertex AI Plugin, on its latest version. I'm having some issues with async node runs, for some reason it is taking a lot longer than when run locally. It might be because I'm allocanting a GPU to parto of the process, but it shouldn't, in my perspective, so if anyone have any ideas or suggestions, I'll appreciate that...
    r
    • 2
    • 12
  • m

    Mark Druffel

    10/18/2024, 7:38 PM
    Hey there, another question on the ibis.TableDataset. Just moving a bunch of our local code (duckdb) to databricks and hit a snag. We're using unity catalog (UC). I loaded raw tables into UC manually for simplicity and confirmed I can load them using an ibis connection (see screenshot 1). When I try to load this table in using the TableDataset I get an error saying "`raw_tracks` cannot be found" (see screenshot 2). I think this is because the load() method doesn't pull in database from the config...
    Copy code
    raw_tracks:
      type: ibis.TableDataset
      table_name: raw_tracks
      connection:
        backend: pyspark
        database: comms_media_dev.dart_extensions
    Copy code
    def load(self) -> ir.Table:
                return self.connection.table(self._table_name)
    I think updating load() seems fairly simple, something like the code below works, but was the initial intent that we could pass a catalog / database through the config here? If yes on the latter I think perhaps I'm not using the spark config properly or databricks is doing something strange... posted a question about that here for context.
    Copy code
    def load(self) -> ir.Table:
                return self.connection.table(name = self._table_name, database = self._database)
    d
    • 2
    • 4
  • t

    Thabo Mphuthi

    11/20/2024, 5:49 AM
    Hey folks, has anyone use the kedro-azureml plugin on a Apple M1 mac? Seem to be unable to install it locally due to a dependency on packages that are unsupported on M1 chips (azureml-sdk etc,).
    h
    m
    +3
    • 6
    • 10
  • n

    Nok Lam Chan

    11/27/2024, 6:35 AM
    Stay tuned with upcoming Kedro VSCode releases (it will probably show up in 0.3.0, we will release 0.2.3 for some bug fixes including Windows issue), we are working on improving the static catalog validation, It will validate against user virtual environment, so it's able to detect missing dependencies/third parties dependencies.
    🙌 10
    ❤️ 5
    h
    j
    +4
    • 7
    • 7
  • h

    Himanshu Sharma

    12/12/2024, 10:16 AM
    Hi Team, I'm getting an issue while using kedro-azureml using this doc - link, Able to run all steps without any issues but while the pipeline runs in Azure ML it gives the following error:
    Copy code
    Failed to execute command group with error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` failed with status code `1` and it was not possible to extract the structured error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` exited with code 1 due to error None and we couldn't read the error due to GetErrorFromContainerFailed { last_stderr: Some("exec /mnt/azureml/cr/j/0341a555koec4794bb36cf074f0386h/cap/lifecycler/wd/execution-wrapper: no such file or directory\n") }.
    Pipeline screenshot from Azure ML:
    h
    m
    a
    • 4
    • 5
  • g

    Guillaume Tauzin

    02/10/2025, 4:45 PM
    Hi Team! Anyone ever played with hyperparameter tuning frameworks within kedro? I have found several scattered pieces of info related to this topic, but no complete solutions. Ultimately, I think what I would like to set up is a way to have multiple nodes running at the same time and all contributing to the same tuning experiment. I would prefer using optuna and this is the way I would go about it based on what I have found online: 1. Create a node that creates an optuna study 2. Create N nodes that each run hyperparameter tuning in parallel. Each of them loads the optuna study and if using kedro-mlflow each hyperparameter trial can be logged into its own nested run. 3. Create a final nodes that process the results of all tuning nodes Does this sound reasonable to you? Has anyone produced such a kedro workflow already? I would love to see what it looks like. I am also wondering: • I am thinking of creating an OptunaStudyDataset for the optuna study . Has anyone attempted this already? • For creating N tuning nodes, I am thinking of using the approach presented on the GetInData blog post on dynamic pipelines. Would this be the recommended approach? Thanks!
    h
    j
    +2
    • 5
    • 15
  • p

    Philipp Dahlke

    02/13/2025, 11:03 AM
    Hi guys, I am having trouble to run my kedro from a docker build. I'm using MLflow and the
    kedro_mlflow.io.artifacts.MlflowArtifactDataset
    I followed the instructions for building the container from kedro-docker repo but when running, those artifacts want to access my local windows path instead of the containers path. Do you guys know what additional settings I have to make? All my settings in are pretty much vanilla. The
    mlflow_tracking_uri
    is set to null
    Copy code
    "{dataset}.team_lexicon":
      type: kedro_mlflow.io.artifacts.MlflowArtifactDataset  
      dataset:
        type: pandas.ParquetDataset  
        filepath: data/03_primary/{dataset}/team_lexicon.pq 
        metadata:
          kedro-viz:
            layer: primary  
            preview_args:
                nrows: 5
    Copy code
    Traceback (most recent call last):
      
    kedro.io.core.DatasetError: Failed while saving data to dataset MlflowParquetDataset(filepath=/home/kedro_docker/data/03_primary/D1-24-25/team_lexicon.pq, load_args={}, protocol=file, save_args={}).
    [Errno 13] Permission denied: '/C:'
    h
    d
    • 3
    • 4
  • b

    Bibo Bobo

    02/16/2025, 12:18 PM
    Hello, guys, I noticed that there is no support for
    log_table
    method in kedro-mlflow. So I wonder what will be the right way to log additional data from a node, something that is not yet supported by the plugin? Right now I just do something like this at the end of the node function
    Copy code
    mlflow.log_table(data_for_table, output_filename)
    But I am concerned as I am not sure if it will always work and will always log the data to the correct run because I was not able to get retrieve the active run id from inside the node with
    mlflow.active_run()
    (it returns
    None
    all the time). I need this because I want to use the
    Evaluation
    tab in the UI to manually compare some outputs of different runs.
    h
    y
    p
    • 4
    • 16
  • y

    Yifan

    02/20/2025, 2:33 PM
    Hello guys! Noticed there is a typing-annotation bug in
    kedro-mlflow 0.14.3
    specific to
    python 3.9
    . It seems that a fix is already merged in the repo. When would the fix be released? Thank!
    h
    i
    y
    • 4
    • 5
  • i

    Ian Whalen

    02/25/2025, 3:38 PM
    I think this belongs in plugins! If I remember correctly, there was once a pycharm friendly version of this: https://github.com/kedro-org/vscode-kedro Does that exist anywhere still?
    h
    d
    +3
    • 6
    • 9
  • j

    Juan Luis

    02/25/2025, 4:58 PM
    hi folks, in case it's useful for anybody, yesterday I quickly hacked a kedro-openlineage integration, and demonstrated it using Marquez. I guess it should work with any OL consumer but you tell me 🙂 https://github.com/astrojuanlu/kedro-openlineage
    ❤️ 4
    h
    g
    • 3
    • 3
  • j

    Juan Luis

    03/11/2025, 4:43 PM
    happy to announce that @em-pe released
    kedro-azureml
    0.9.0 and
    kedro-vertexai
    0.12.0 with support for the most recent Kedro and Python versions. you can thank GetInData for it 👏🏼
    K 6
    🥳 4
    vertex ai 5
    azure 6
    e
    • 2
    • 1
  • m

    Merel

    03/26/2025, 10:39 AM
    I think Kedro
    0.19.12
    and the changes we did to the databricks starter (https://github.com/kedro-org/kedro-starters/pull/267) might have broken the resource creation for the
    kedro-databricks
    plugin @Jens Peder Meldgaard. When I do
    kedro databricks bundle
    the resources folder gets created, but it's empty. (cc: @Sajid Alam)
    j
    s
    • 3
    • 8
  • m

    Merel

    03/27/2025, 8:31 AM
    Hi @Jens Peder Meldgaard, I'm learning more about how
    kedro-databricks
    works and I was wondering whether it makes sense to use any of the other runners (
    ThreadRunner
    or
    ParallelRunner
    )? As far as I understand for every node we use these run parameters
    --nodes name, --conf-source self.remote_conf_dir, --env self.env
    . Would it make sense to allow for adding runner type too? Or if you want parallel running you should use the databricks cluster setup for that? I'm not very familiar with all the run options in Databricks, so trying to figure out where to use Kedro features and where Databricks. (cc: @Rashida Kanchwala)
    d
    j
    • 3
    • 7
  • y

    Yury Fedotov

    05/28/2025, 7:47 PM
    Does
    kedro-mlflow
    support custom model flavors in datasets? I'm reading in docs that yes, but wanted to double check that this is relevant. @Yolan Honoré-Rougé
    y
    • 2
    • 7
  • y

    Yolan Honoré-Rougé

    05/28/2025, 8:30 PM
    (and for the record kedro mlflow has a built in custom model to log an entire kedro pipeline which may be useful)
    👍 1
  • j

    Jens Peder Meldgaard

    07/02/2025, 6:37 AM
    Hi, This issue was created for
    kedro-databricks
    and I am uncertain of how to resolve it - anyone who can help me figure out what to do here? 🙏 https://github.com/JenspederM/kedro-databricks/issues/135 A bit of explanation: The issue occurs when using
    namespaces
    for pipelines, as it prepends the
    namespace
    to any input and output resulting in, e.g.,
    ValueError: Pipeline input(s) {'active_modelling_pipeline.X_train', 'active_modelling_pipeline.y_train'} not found in the DataCatalog
    when using a
    namespace
    called
    active_modelling_pipeline
    . When nodes are executed in Databricks, each node is executed in a workflow task with a command similar to
    kedro run --nodes <node-name> --conf-source <some-path> --env <some-env>
    . Do I need to add the
    --namespace <some-namespace>
    option to the invocation to get it to correctly resolve the catalog paths?
    đź‘€ 1
  • y

    Yury Fedotov

    07/30/2025, 1:10 PM
    Hi team, are there plans to make
    kedro-pandera
    support 1.0? @Yolan Honoré-Rougé @Nok Lam Chan
    n
    y
    • 3
    • 3
  • m

    Max Pardoe

    07/31/2025, 11:07 AM
    @Max Pardoe has left the channel
  • s

    SIMON TAMAYO

    08/27/2025, 3:02 PM
    @SIMON TAMAYO has left the channel