https://kedro.org/ logo
Join Slack
Powered by
# plugins-integrations
  • l

    Leonardo David Treiger Herszenhaut Brettas

    08/10/2024, 5:26 PM
    Anyone knows some plugin for SAP??
    d
    • 2
    • 4
  • k

    Kacper Leƛniara

    08/13/2024, 7:58 AM
    Hey hey! I tried to add handling inference parameters in the kedro-mlflow's PipelineML model packaging. Shared the fruits in this PR 😉. And thanks @Artur Dobrogowski for the support 🙌
    ❀ 2
    y
    a
    • 3
    • 3
  • m

    Matt Glover

    08/22/2024, 7:27 AM
    Hi - has anyone written a dataset class for AWS Athena that they could share before i attempt to do it myself?
    d
    • 2
    • 3
  • m

    Mark Druffel

    08/22/2024, 9:31 PM
    Question about ibis.TableDataset. Is there a way to use the pandas backend in a pipeline? It seems like you can't write pandas output to a file or a database. It seems like this is by design and makes sense for a *Table*Dataset, but is that the intent? I really like the Ibis API and would prefer to use it as my primary dataframe library. I mostly work with pyspark and duckdb so it's a natural fit there, but I'm wondering if there is a long-term plan or willingness to consder adding
    to_
    methods (i.e. to_csv, to_delta, etc.) to the ibis.TableDataset? Or perhaps there should be a different ibis Dataset? Details I'm trying to pre-process some badly formed csv files in my pipeline. I know I can use a pandas node separately, but I prefer the ibis api so I tried to use TableDataset. I have the following data catalog entries:
    Copy code
    raw:
      type: ibis.TableDataset
      filepath: data/01_raw/raw.csv
      file_format: csv
      connection: 
        backend: pandas
      load_args:
        sep: ","
    
    preprocessed:
      type: ibis.TableDataset
      table_name: preprocessed
      connection: 
        backend: pandas
        database: test.db
      save_args:
        materialized: table
    
    standardized:
      type: ibis.TableDataset
      table_name: standardized
      file_format: csv
      connection: 
        backend: duckdb
        database: finance.db
      save_args:
        materialized: table
    The pipeline code looks like this:
    Copy code
    def create_pipeline(**kwargs) -> Pipeline:
        return pipeline(
            [
                node(
                    func=preprocess_raw,
                    inputs="raw",
                    outputs="preprocessed",
                    name="preprocess"
                ),
                node(
                    func=standardize,
                    inputs="preprocessed",
                    outputs="standardized",
                    name="standardize"
                ),
            ]
        )
    I jump into an ipython session with
    kedro ipython
    and run `catalog.load("preprocessed") and get the error
    TypeError: BasePandasBackend.do_connect() got an unexpected keyword argument 'database'
    , which is coming from Ibis. After looking at the backend setup, I see database isn't a valid argument. I remove database and reran and got the error
    DatasetError: Failed while saving data to data set... Unable to convert <class 'ibis.expr.types.relations.Table'> object to backend type: <class 'pandas.core.frame.DataFrame'>
    . I didn't exactly expect this to work, but I wasn't sure...
    Copy code
    preprocessed:
      type: ibis.TableDataset
      table_name: preprocessed
      connection: 
        backend: pandas
    Then I tried removing table_name as well and got the obvious error that I need a table_name or a filepath. `DatasetError: Must provide at least one of
    filepath
    or
    table_name
    .` No doubt 😂
    Copy code
    preprocessed:
      type: ibis.TableDataset
      connection: 
        backend: pandas
    Then I tried adding a filepath and get the error `DatasetError: Must provide
    table_name
    for materialization.` which I can see in TableDataset's
    _write
    method.
    Copy code
    preprocessed:
      type: ibis.TableDataset
      filepath: data/02_preprocessed/preprocessed.csv
      connection: 
        backend: pandas
    👍 1
    d
    r
    • 3
    • 16
  • b

    Bruk Tafesse

    08/27/2024, 11:20 AM
    Hi everyone, I have a dataset configure like the following
    Copy code
    predictions:
      type: pandas.GBQTableDataset
      dataset: ...
      table_name: table_name
      project: ....
      save_args:
        if_exists: replace
    Is there a way to configure the
    table_name
    when creating a pipeline job using the vertex ai sdk? I am using compiled pipelines btw. Thanks
  • l

    Lukas Innig

    08/29/2024, 9:11 PM
    Is anyone aware of an integration between kedro and IaC tools such as terraform or pulumi?
    m
    • 2
    • 5
  • v

    Vishal Pandey

    09/05/2024, 11:49 AM
    Hello everyone I am trying to run a toy kedro pipeline on kubeflow. So i build a docker image and executed the pipeline in the container locally and it runs fine. I have published the pipeline on kubeflow as well. But when I execute the pipeline on kubeflow I am getting an error
    time="2024-09-05T11:37:29.010Z" level=info msg="capturing logs" argo=true
    cp: cannot stat '/home/kedro/data/*': No such file or directory
    time="2024-09-05T11:37:30.011Z" level=info msg="sub-process exited" argo=true error="<nil>"
    Error: exit status 1
    @Artur Dobrogowski Can you help
    a
    • 2
    • 3
  • v

    Vishal Pandey

    09/10/2024, 3:34 PM
    @Artur Dobrogowski Do we we have any python sdk for kubeflow plugin. We are looking for a pythonic way to use the functionalities instead of the CLI being offered.
    a
    e
    • 3
    • 5
  • m

    Mark Druffel

    09/13/2024, 6:44 PM
    Has anyone used ibis.TableDataset with duckdb schemas? If I set a schema on a data catalog entry I get the error
    Invalid Input Error: Could not set option "schema" as a global option
    .
    Copy code
    bronze_x:
      type: ibis.TableDataset
      filepath: x.csv
      file_format: csv
      table_name: x
      backend: duckdb
      database: data.duckdb
      schema: bronze
    I can reproduce this error with vanilla ibis:
    Copy code
    con = ibis.duckdb.connect(database="data.duckdb", schema = "bronze")
    Found a related question on ibis' github, it sounds like duckdb can't set the schema globally so it has to be done in the table functions. Wondering if this would require a change to ibis.TableDataset, and if so, would this pattern work the same with other backends?
    d
    • 2
    • 3
  • d

    Deepyaman Datta

    09/16/2024, 12:53 PM
    Am I correct in understanding that Kedro-Pandera will only work with pandas schemas currently? I saw that it uses
    pandera.io.deserialize_schema
    under the hood in it's schema resolver, and that seems to be only implemented in pandera for pandas, is that right?
    l
    y
    n
    • 4
    • 10
  • v

    Vishal Pandey

    09/18/2024, 4:59 PM
    Hello everyone This is regarding the kubeflow plugin. I wanted to just gain some information about how kedro nodes are executed by kubeflow. Does kubeflow run each node in a separate container ?? or separate pods ?? or all of nodes are executed in the same container
    a
    • 2
    • 95
  • l

    LĂ­via Pimentel

    09/19/2024, 3:30 PM
    Hi, everyone. Can someone confirm if its possible to use kedro-azureml with kedro>=0.19? From what I see here it's not, but wanted to confirm it if mabe the website is outdated
    d
    m
    +2
    • 5
    • 8
  • v

    Vishal Pandey

    09/25/2024, 8:47 AM
    Hey Folks I am looking for a way to mount AWS EFS volume to my kedro pipeline which will be executed by kubeflow . I am using the kubeflow plugin. The config has below 2 options for Volumes , I am not sure which one is for what purpose 1.
    Copy code
    volume:
    
        # Storage class - use null (or no value) to use the default storage
        # class deployed on the Kubernetes cluster
        storageclass: # default
    
        # The size of the volume that is created. Applicable for some storage
        # classes
        size: 1Gi
    
        # Access mode of the volume used to exchange data. ReadWriteMany is
        # preferred, but it is not supported on some environements (like GKE)
        # Default value: ReadWriteOnce
        #access_modes: [ReadWriteMany]
    
        # Flag indicating if the data-volume-init step (copying raw data to the
        # fresh volume) should be skipped
        skip_init: False
    
        # Allows to specify user executing pipelines within containers
        # Default: root user (to avoid issues with volumes in GKE)
        owner: 0
    
        # Flak indicating if volume for inter-node data exchange should be
        # kept after the pipeline is deleted
        keep: False
    2.
    Copy code
    # Optional section to allow mounting additional volumes (such as EmptyDir)
      # to specific nodes
      extra_volumes:
        tensorflow_step:
        - mount_path: /dev/shm
          volume:
            name: shared_memory
            empty_dir:
              cls: V1EmptyDirVolumeSource
              params:
                medium: Memory
    m
    n
    • 3
    • 8
  • v

    Vishal Pandey

    09/26/2024, 8:07 AM
    Hey Everyone I wanted to know more about kedro CLI that we have . So there are arguments like
    --env , --nodes , -- pipelines
    which we pass using the
    kedro run
    command . So for any given plugin related to deployments like airflow , kubeflow . How can we supply these arguments ?
    m
    n
    • 3
    • 4
  • g

    George p

    10/03/2024, 11:53 PM
    Hey all. I recently stumbled across "React Flow" (link) while searching for an open source graph & node drag-and-drop solution. Specifically, I am looking to create a UI for my team, which would allow for easy no-code pipeline creation, by presenting a list of available nodes (on the side, as "prewritten" functions) which can then be dragged-dropped-and-connected with the rest of the pipeline. I am unsure if something like this would... 'play nice' with kedro-viz (now and/or in the future), but is there anyone who has thought about this before? If so, what did you do about it (ideally in combination with kedro/kedro-viz)? [I have posted 2 relevant links below]
    👀 1
    j
    r
    d
    • 4
    • 7
  • a

    Alexandre Ouellet

    10/15/2024, 5:17 PM
    Hey there! Quick question about kedro-azureml. We are using AzureML, and we'd like to use AzureMLAssetDataset with dataset factories. After a lot of headach and debugging, it seems impossible to use both, as the way credentials are passed to the AzureMLAssetDataset is done through a hook (after_catalog_created), but the issue is that if you use a dataset_patterns (as in, declare your dataset as "{name}.csv" or something similar), the hook is called, but the patterned dataset is not instanciated yet. After all that, a before_node_run is called, and then there is a AzureMLAssetDataset._load() called, but the AzureMLAssetDataset.azure_config setter hasn't been called yet (as it is called only in the after_catalog_created hook). At first glance, it seems like a kedro-azureml issue, as AzureMLAssetDataset._load() can be called without the setter being called when used as a dataset factory. But also, it might be a kedro issue, as I think there should be an obvious way to setup credentials in that specific scenario, and I don't quite see it from the docs on hook either
    j
    r
    e
    • 4
    • 15
  • t

    Thiago José Moser Poletto

    10/17/2024, 5:25 PM
    Hey guys I would like to know if theres anyone that have tested the Kedro Vertex AI Plugin, on its latest version. I'm having some issues with async node runs, for some reason it is taking a lot longer than when run locally. It might be because I'm allocanting a GPU to parto of the process, but it shouldn't, in my perspective, so if anyone have any ideas or suggestions, I'll appreciate that...
    r
    • 2
    • 12
  • m

    Mark Druffel

    10/18/2024, 7:38 PM
    Hey there, another question on the ibis.TableDataset. Just moving a bunch of our local code (duckdb) to databricks and hit a snag. We're using unity catalog (UC). I loaded raw tables into UC manually for simplicity and confirmed I can load them using an ibis connection (see screenshot 1). When I try to load this table in using the TableDataset I get an error saying "`raw_tracks` cannot be found" (see screenshot 2). I think this is because the load() method doesn't pull in database from the config...
    Copy code
    raw_tracks:
      type: ibis.TableDataset
      table_name: raw_tracks
      connection:
        backend: pyspark
        database: comms_media_dev.dart_extensions
    Copy code
    def load(self) -> ir.Table:
                return self.connection.table(self._table_name)
    I think updating load() seems fairly simple, something like the code below works, but was the initial intent that we could pass a catalog / database through the config here? If yes on the latter I think perhaps I'm not using the spark config properly or databricks is doing something strange... posted a question about that here for context.
    Copy code
    def load(self) -> ir.Table:
                return self.connection.table(name = self._table_name, database = self._database)
    d
    • 2
    • 4
  • t

    Thabo Mphuthi

    11/20/2024, 5:49 AM
    Hey folks, has anyone use the kedro-azureml plugin on a Apple M1 mac? Seem to be unable to install it locally due to a dependency on packages that are unsupported on M1 chips (azureml-sdk etc,).
    h
    m
    +3
    • 6
    • 10
  • n

    Nok Lam Chan

    11/27/2024, 6:35 AM
    Stay tuned with upcoming Kedro VSCode releases (it will probably show up in 0.3.0, we will release 0.2.3 for some bug fixes including Windows issue), we are working on improving the static catalog validation, It will validate against user virtual environment, so it's able to detect missing dependencies/third parties dependencies.
    🙌 10
    ❀ 5
    h
    j
    +4
    • 7
    • 7
  • h

    Himanshu Sharma

    12/12/2024, 10:16 AM
    Hi Team, I'm getting an issue while using kedro-azureml using this doc - link, Able to run all steps without any issues but while the pipeline runs in Azure ML it gives the following error:
    Copy code
    Failed to execute command group with error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` failed with status code `1` and it was not possible to extract the structured error Container `0341a555koec4794bb36cf074f0386h-execution-wrapper` exited with code 1 due to error None and we couldn't read the error due to GetErrorFromContainerFailed { last_stderr: Some("exec /mnt/azureml/cr/j/0341a555koec4794bb36cf074f0386h/cap/lifecycler/wd/execution-wrapper: no such file or directory\n") }.
    Pipeline screenshot from Azure ML:
    h
    m
    a
    • 4
    • 5
  • g

    Guillaume Tauzin

    02/10/2025, 4:45 PM
    Hi Team! Anyone ever played with hyperparameter tuning frameworks within kedro? I have found several scattered pieces of info related to this topic, but no complete solutions. Ultimately, I think what I would like to set up is a way to have multiple nodes running at the same time and all contributing to the same tuning experiment. I would prefer using optuna and this is the way I would go about it based on what I have found online: 1. Create a node that creates an optuna study 2. Create N nodes that each run hyperparameter tuning in parallel. Each of them loads the optuna study and if using kedro-mlflow each hyperparameter trial can be logged into its own nested run. 3. Create a final nodes that process the results of all tuning nodes Does this sound reasonable to you? Has anyone produced such a kedro workflow already? I would love to see what it looks like. I am also wondering: ‱ I am thinking of creating an OptunaStudyDataset for the optuna study . Has anyone attempted this already? ‱ For creating N tuning nodes, I am thinking of using the approach presented on the GetInData blog post on dynamic pipelines. Would this be the recommended approach? Thanks!
    h
    j
    +2
    • 5
    • 15
  • p

    Philipp Dahlke

    02/13/2025, 11:03 AM
    Hi guys, I am having trouble to run my kedro from a docker build. I'm using MLflow and the
    kedro_mlflow.io.artifacts.MlflowArtifactDataset
    I followed the instructions for building the container from kedro-docker repo but when running, those artifacts want to access my local windows path instead of the containers path. Do you guys know what additional settings I have to make? All my settings in are pretty much vanilla. The
    mlflow_tracking_uri
    is set to null
    Copy code
    "{dataset}.team_lexicon":
      type: kedro_mlflow.io.artifacts.MlflowArtifactDataset  
      dataset:
        type: pandas.ParquetDataset  
        filepath: data/03_primary/{dataset}/team_lexicon.pq 
        metadata:
          kedro-viz:
            layer: primary  
            preview_args:
                nrows: 5
    Copy code
    Traceback (most recent call last):
      
    kedro.io.core.DatasetError: Failed while saving data to dataset MlflowParquetDataset(filepath=/home/kedro_docker/data/03_primary/D1-24-25/team_lexicon.pq, load_args={}, protocol=file, save_args={}).
    [Errno 13] Permission denied: '/C:'
    h
    d
    • 3
    • 4
  • b

    Bibo Bobo

    02/16/2025, 12:18 PM
    Hello, guys, I noticed that there is no support for
    log_table
    method in kedro-mlflow. So I wonder what will be the right way to log additional data from a node, something that is not yet supported by the plugin? Right now I just do something like this at the end of the node function
    Copy code
    mlflow.log_table(data_for_table, output_filename)
    But I am concerned as I am not sure if it will always work and will always log the data to the correct run because I was not able to get retrieve the active run id from inside the node with
    mlflow.active_run()
    (it returns
    None
    all the time). I need this because I want to use the
    Evaluation
    tab in the UI to manually compare some outputs of different runs.
    h
    y
    p
    • 4
    • 16
  • y

    Yifan

    02/20/2025, 2:33 PM
    Hello guys! Noticed there is a typing-annotation bug in
    kedro-mlflow 0.14.3
    specific to
    python 3.9
    . It seems that a fix is already merged in the repo. When would the fix be released? Thank!
    h
    i
    y
    • 4
    • 5
  • i

    Ian Whalen

    02/25/2025, 3:38 PM
    I think this belongs in plugins! If I remember correctly, there was once a pycharm friendly version of this: https://github.com/kedro-org/vscode-kedro Does that exist anywhere still?
    h
    d
    +3
    • 6
    • 9
  • j

    Juan Luis

    02/25/2025, 4:58 PM
    hi folks, in case it's useful for anybody, yesterday I quickly hacked a kedro-openlineage integration, and demonstrated it using Marquez. I guess it should work with any OL consumer but you tell me 🙂 https://github.com/astrojuanlu/kedro-openlineage
    ❀ 4
    h
    g
    • 3
    • 3
  • j

    Juan Luis

    03/11/2025, 4:43 PM
    happy to announce that @em-pe released
    kedro-azureml
    0.9.0 and
    kedro-vertexai
    0.12.0 with support for the most recent Kedro and Python versions. you can thank GetInData for it đŸ‘đŸŒ
    K 6
    đŸ„ł 4
    vertex ai 5
    azure 6
    e
    • 2
    • 1
  • m

    Merel

    03/26/2025, 10:39 AM
    I think Kedro
    0.19.12
    and the changes we did to the databricks starter (https://github.com/kedro-org/kedro-starters/pull/267) might have broken the resource creation for the
    kedro-databricks
    plugin @Jens Peder Meldgaard. When I do
    kedro databricks bundle
    the resources folder gets created, but it's empty. (cc: @Sajid Alam)
    j
    s
    • 3
    • 7
  • m

    Merel

    03/27/2025, 8:31 AM
    Hi @Jens Peder Meldgaard, I'm learning more about how
    kedro-databricks
    works and I was wondering whether it makes sense to use any of the other runners (
    ThreadRunner
    or
    ParallelRunner
    )? As far as I understand for every node we use these run parameters
    --nodes name, --conf-source self.remote_conf_dir, --env self.env
    . Would it make sense to allow for adding runner type too? Or if you want parallel running you should use the databricks cluster setup for that? I'm not very familiar with all the run options in Databricks, so trying to figure out where to use Kedro features and where Databricks. (cc: @Rashida Kanchwala)
    d
    j
    • 3
    • 7