https://kedro.org/ logo
Join Slack
Powered by
# questions
  • c

    Caroline Lei

    03/08/2023, 9:09 AM
    Hi team, I am using kedro mlflow plugin in my pipeline. I have defined a
    conf/local/mlflow.yml
    file. I am wondering if I can pass in the run name via kedro commend line. I tried
    kedro run --params=tracking.run.name:"test_name"
    but it didn’t work.
    d
    y
    • 3
    • 4
  • j

    Juan Luis

    03/08/2023, 10:42 AM
    hi folks, is there a CLI command to show the available pipelines? something like
    kedro pipeline list
    or similar
    m
    • 2
    • 6
  • o

    Ola Cupriak

    03/08/2023, 4:28 PM
    Hi! I need to find some bioinformatics studies (e.g. with ConvNet for analysis of medical images or ML models for analysis of sequencing data to diagnose diseases) in which the authors used Kedro. Do you know of any interesting papers? Thanks for your help! 🙂
    d
    n
    • 3
    • 4
  • s

    Suryansh Soni

    03/08/2023, 6:33 PM
    Hello everyone ! I have some question regarding kedro pipeline for forecasting solution. is there a way to run kedro sub-pipeline inside a loop so that it generates the forecast. catch output of one iteration is input for other iteration. Please let me know.
    d
    s
    • 3
    • 5
  • r

    Ryan Ng

    03/09/2023, 8:37 AM
    Hi everyone, I am trying to log the version of a versioned model that is used to make an inference and then add that alongside the inference output as either a column or a separate key. Do you know if this is something that is possible to do without a complicated workaround? For example, a model is trained and saved as a versioned dataset then in a different pipeline run the model is loaded to make an inference, and we would like to log that version timestamp name as a column in the predicted score table. The push is to add transparency to be able to note which model version has been used to make an inference score.
    d
    • 2
    • 2
  • j

    Juan Luis

    03/09/2023, 12:24 PM
    hi folks, I'm finding some interesting behavior of paths in the catalog when working from notebooks. my catalog entry looks like this:
    Copy code
    openrepair-0_3-events-raw:
      type: polars.CSVDataSet
      filepath: data/01_raw/OpenRepairData_v0.3_aggregate_202210.csv
    but if I try to load the data from a notebook in
    notebooks/
    with this code:
    Copy code
    conf_loader = ConfigLoader("../conf")
    conf_catalog = conf_loader.get("catalog.yml")
    catalog = DataCatalog.from_config(conf_catalog)
    
    catalog.load("openrepair-0_3-events-raw")
    then I get a "file not found" error. however, if I change the
    filepath:
    to
    ../data/...
    or I move the notebook one directory up or if I use the
    kedro.ipython
    extension, the error goes away. my aim is to show how to gradually move from non-Kedro to Kedro, and as an intermediate stage, I'm loading the catalog manually. I suppose there's some extra magic happening under the hood that properly resolves the paths?
    d
    • 2
    • 4
  • j

    Juan Luis

    03/09/2023, 12:37 PM
    in other news, I'm having trouble passing
    dtypes
    to the upcoming
    polars.CSVDataSet
    , not sure if there's a way to specify non-primitive types in the catalog YAML? https://github.com/kedro-org/kedro-plugins/issues/124
    d
    m
    m
    • 4
    • 29
  • a

    Ana Man

    03/09/2023, 4:33 PM
    Hi Everyone! Is there any documentation on creating custom configloaders that extend from AbstractConfigLoader class? there seems to be some necessary defaults and conventions which im finding through trial and error (and looking at other loader implementations) but trying to understand what is the bare minimum i need for a custom loader to work in a session
    m
    d
    • 3
    • 4
  • r

    Rebecca Solcia

    03/09/2023, 4:55 PM
    Hello! Is there any functionality that allows to run Kedro pipelines in debug?
    d
    j
    +2
    • 5
    • 13
  • s

    Slackbot

    03/09/2023, 5:28 PM
    This message was deleted.
    ❌ 1
    d
    j
    • 3
    • 5
  • b

    Brandon Meek

    03/09/2023, 5:55 PM
    Hey all, if I wanted to request that a part of Kedro be moved to a plugin so people could install it as a standalone tool, would I do that in Kedro or in Kedro-Plugins?
    d
    n
    +2
    • 5
    • 14
  • a

    Andrew Stewart

    03/10/2023, 12:19 AM
    If I'm packaging and distributing pipeline as a wheel file, and then go to run it as follows:
    Copy code
    from mypipeline.__main__ import main
    
    main()
    ..can anyone think of a reason why any custom datasets under
    mypipeline.extras.datasets.MyDataSet
    would not be installed along with the wheel?
    Copy code
    kedro.io.core.DataSetError: Class 'mypipeline.extras.datasets.MyDataSet' not found or one of its dependencies has not been installed.
    m
    n
    • 3
    • 5
  • a

    Ana Man

    03/10/2023, 1:22 PM
    Hi again! i have a quick questions about
    OmegaConfigLoader
    . Apart from adding
    CONFIG_LOADER_CLASS = OmegaConfigLoader
    to the settings.py, what other minimum changes are needed in your project to use this loader? having issues with running it 'out the box' (btw relatively new to kedro ecosystem)
    j
    e
    +2
    • 5
    • 11
  • j

    Jorge sendino

    03/10/2023, 4:55 PM
    Hey everyone! Is there a way in Kedro Viz to hide datasets by default similarly to how parameters are hidden? In large pipelines this will declutter a lot the visualization
    d
    • 2
    • 2
  • r

    Ricardo Araújo

    03/10/2023, 7:21 PM
    Hey y'all. A tough one, for me at least: say my data is a monthly time series. I want to train one model per month. I can do it easily with a for loop, but that won't allow me to run in parallel. Is there a kedro-esque way to do this using maybe modular pipelines? I think I know how to do it if there was a fixed number of months, but that is not the case.
    i
    d
    • 3
    • 35
  • e

    ed johnson

    03/10/2023, 9:47 PM
    Hello, what is the recommended way to run multiple pipelines sequentially? An obvious approach is just with multiple
    kedro run --pipeline <pipeline_i>
    commands defined inside a shell script, but i'm wondering if there is a better way perhaps using the run config.yml capability?
    r
    d
    • 3
    • 10
  • a

    Andrew Stewart

    03/10/2023, 11:45 PM
    Anyone interested in testing an
    AthenaDataSet
    dataset (or even just code reviewing) ?
  • s

    Sebastian Cardona Lozano

    03/11/2023, 12:26 AM
    Hi all. I'm trying to set up the version option for a SparkDataSet in the Catalog, but I got the next error when the node tries to save the dataset as .parquet file in Google Cloud Storage:
    Copy code
    VersionNotFoundError: Did not find any versions for SparkDataSet(file_format=parquet, 
    filepath=<gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.pa>
    rquet, load_args={'header': True, 'inferSchema': True}, save_args={}, version=Version(load=None, 
    save='2023-03-10T23.44.07.085Z'))
    In the catalog.yml I have this:
    Copy code
    master_model_input:
        type: spark.SparkDataSet
        filepath: <gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.parquet> #<gs://uri> de cloud storage
        file_format: parquet
        layer: model_input
        versioned: True
        load_args:
            header: True
            inferSchema: True
    However, the parquet file is generated correctly in GCS (see the image attached). Thanks for your help! 🙂
    j
    • 2
    • 3
  • s

    Sebastian Cardona Lozano

    03/11/2023, 1:09 AM
    Hi again. If I want to save a ML model build with PySpark/MLlib, which file type do I have to use in the Catalog.yml? Thanks! 🙂
    n
    w
    • 3
    • 3
  • r

    rss

    03/12/2023, 12:58 AM
    Colorful notebook output with rich library May someone tell me how to set again that colorful output from jupyter notebook without using rich.print? I use VSCode. I've got this feature with kedro=0.18.4 and lost with kedro=0.18.5. Kedro requires rich as an dependency.

    https://i.stack.imgur.com/KeeZJ.png▾

    I think it was a rich's bug, because after update dependencies in my project I lost this feature. ;) Previous similar topic with this bug: <a...
  • r

    Rebecca Solcia

    03/13/2023, 12:03 PM
    Good morning! Quick question. I have saved a dataset with the following configurations
    Copy code
    05_07_FocusDatasource_PKL:
      type: kedro.extras.datasets.pickle.PickleDataSet
      filepath: data/02_intermediate/05_07_FocusDatasource.pkl
    But when I call
    catalog.load('05_07_FocusDatasource_PKL')
    it tells me that it is a function
    Copy code
    <function focus_pickle at 0x7fbff82af040>
    Any suggestions on how I can load that dataset?
    n
    • 2
    • 1
  • s

    Shubham Agrawal

    03/13/2023, 2:31 PM
    Hi! I have a Kedro pipeline which I want to obfuscate or convert to a wheel file and then deploy in a cluster? Does any one know if it is possible or are there any tools that could help me do that?
    j
    d
    • 3
    • 5
  • r

    Robertqs

    03/14/2023, 6:30 AM
    Hi guys, is that possible to share data catalog across multiple projects? Or expose it through APIs? What I m trying to achieve is like a data metadata store for our team, so people can query and get information about datasets. Thanks.
    d
    n
    • 3
    • 5
  • m

    Michal Szlupowicz

    03/14/2023, 11:57 AM
    Hi guys. Im trying to load data from SnowFlake to SparkDataSet using data catalog. We thought that use of SparkJDBCDataSet is the proper way of doing that but I struggle to set up connection drivers. Could someone advise me how ot set it up or suggest other solution?
    d
    y
    • 3
    • 7
  • j

    Jan

    03/14/2023, 1:09 PM
    Hi! I am using kedro locally on my machine and was wondering to implement this example for hooks to monitor the execution time. Is it possible to install grafana locally as well then? How was this dashboard created? Is there a quick way to setup grafana and get this dashboard or will I have to deep dive into how grafana works?
    d
    • 2
    • 1
  • a

    Ana Man

    03/14/2023, 4:20 PM
    Hi everyone! I have a scenario and i wanted to see how people resolve this in their projects: Lets say you have a modular pipeline package that has a pipeline with 9 nodes (called pipe1). you want to amend the functionality of this pipeline to accommodate two conditions. Condition 1 relies on the pipeline as it is. Condition 2 requires a small change : an addition of 2 nodes in the pipeline. What would be the best practice way to extend this pipeline (ensuring backward compatibility)?
    n
    • 2
    • 12
  • d

    Dharmesh Soni

    03/14/2023, 5:25 PM
    Hi everyone! There are zip files having data in text files stored on the cloud. Is there any native Kedro or PySpark solution to read these zip and eventually text files? Structure of zip files:
    Copy code
    ├── main_folder.zip
    │   ├── folder1
    │   │   └── text_file.txt
    │   └── text_file.txt
    d
    • 2
    • 3
  • w

    Walber Moreira

    03/14/2023, 8:11 PM
    Guys, it’s possible to use jinja tem playing and globals param together? Like: {% for country in “${countries}” %} ?
  • t

    Tom C

    03/15/2023, 6:29 AM
    Has anyone had issues with corrupted session_dbs for experiment tracking? My kedro runs are encoding some of what's dumped into the
    runs
    table as a string instead of nested json. This is causing a error when attempting to visualise the runs in viz. I've created a ticket, but want to ask here for people who don't follow the issue boards.
    d
    m
    +2
    • 5
    • 7
  • j

    Jonas Kemper

    03/15/2023, 12:03 PM
    Hi friends, when I have a parameters.yml
    Copy code
    data_science:
      active_modelling_pipeline:
        model_options:
          test_size: 0.2...
    and I load it via
    Copy code
    conf_loader = kedro.config.ConfigLoader(".")
    parameters = conf_loader['parameters']
    that returns me
    Copy code
    {'data_science': {'active_modelling_pipeline': {'model_options': {'test_size': 0.2,
    . When in another place, I
    Copy code
    data_catalog = DataCatalog.from_config(catalog, credentials)
    data_catalog.add_feed_dict(parameters)
    this won't work, because eventually that'll land me
    Copy code
    ValueError: Pipeline input(s) {'params:data_science.candidate_modelling_pipeline.model_options.random_state', ...} not found in the DataCatalog
    What's the intermediate step that I'm missing?
    m
    n
    • 3
    • 21
1...151617...31Latest