Kedro #questions

Caroline Lei

03/08/2023, 9:09 AM

Hi team, I am using kedro mlflow plugin in my pipeline. I have defined a

conf/local/mlflow.yml

file. I am wondering if I can pass in the run name via kedro commend line. I tried

kedro run --params=tracking.run.name:"test_name"

but it didn’t work.

Juan Luis

03/08/2023, 10:42 AM

hi folks, is there a CLI command to show the available pipelines? something like

kedro pipeline list

or similar

Ola Cupriak

03/08/2023, 4:28 PM

Hi! I need to find some bioinformatics studies (e.g. with ConvNet for analysis of medical images or ML models for analysis of sequencing data to diagnose diseases) in which the authors used Kedro. Do you know of any interesting papers? Thanks for your help! 🙂

Suryansh Soni

03/08/2023, 6:33 PM

Hello everyone ! I have some question regarding kedro pipeline for forecasting solution. is there a way to run kedro sub-pipeline inside a loop so that it generates the forecast. catch output of one iteration is input for other iteration. Please let me know.

Ryan Ng

03/09/2023, 8:37 AM

Hi everyone, I am trying to log the version of a versioned model that is used to make an inference and then add that alongside the inference output as either a column or a separate key. Do you know if this is something that is possible to do without a complicated workaround? For example, a model is trained and saved as a versioned dataset then in a different pipeline run the model is loaded to make an inference, and we would like to log that version timestamp name as a column in the predicted score table. The push is to add transparency to be able to note which model version has been used to make an inference score.

Juan Luis

03/09/2023, 12:24 PM

hi folks, I'm finding some interesting behavior of paths in the catalog when working from notebooks. my catalog entry looks like this:

Copy code

openrepair-0_3-events-raw:
  type: polars.CSVDataSet
  filepath: data/01_raw/OpenRepairData_v0.3_aggregate_202210.csv

but if I try to load the data from a notebook in

notebooks/

with this code:

Copy code

conf_loader = ConfigLoader("../conf")
conf_catalog = conf_loader.get("catalog.yml")
catalog = DataCatalog.from_config(conf_catalog)

catalog.load("openrepair-0_3-events-raw")

then I get a "file not found" error. however, if I change the

filepath:

../data/...

or I move the notebook one directory up or if I use the

kedro.ipython

extension, the error goes away. my aim is to show how to gradually move from non-Kedro to Kedro, and as an intermediate stage, I'm loading the catalog manually. I suppose there's some extra magic happening under the hood that properly resolves the paths?

Juan Luis

03/09/2023, 12:37 PM

in other news, I'm having trouble passing

dtypes

to the upcoming

polars.CSVDataSet

, not sure if there's a way to specify non-primitive types in the catalog YAML? https://github.com/kedro-org/kedro-plugins/issues/124

Ana Man

03/09/2023, 4:33 PM

Hi Everyone! Is there any documentation on creating custom configloaders that extend from AbstractConfigLoader class? there seems to be some necessary defaults and conventions which im finding through trial and error (and looking at other loader implementations) but trying to understand what is the bare minimum i need for a custom loader to work in a session

Rebecca Solcia

03/09/2023, 4:55 PM

Hello! Is there any functionality that allows to run Kedro pipelines in debug?

Slackbot

03/09/2023, 5:28 PM

This message was deleted.

❌ 1

Brandon Meek

03/09/2023, 5:55 PM

Hey all, if I wanted to request that a part of Kedro be moved to a plugin so people could install it as a standalone tool, would I do that in Kedro or in Kedro-Plugins?

Andrew Stewart

03/10/2023, 12:19 AM

If I'm packaging and distributing pipeline as a wheel file, and then go to run it as follows:

Copy code

from mypipeline.__main__ import main

main()

..can anyone think of a reason why any custom datasets under

mypipeline.extras.datasets.MyDataSet

would not be installed along with the wheel?

Copy code

kedro.io.core.DataSetError: Class 'mypipeline.extras.datasets.MyDataSet' not found or one of its dependencies has not been installed.

Ana Man

03/10/2023, 1:22 PM

Hi again! i have a quick questions about

OmegaConfigLoader

. Apart from adding

CONFIG_LOADER_CLASS = OmegaConfigLoader

to the settings.py, what other minimum changes are needed in your project to use this loader? having issues with running it 'out the box' (btw relatively new to kedro ecosystem)

Jorge sendino

03/10/2023, 4:55 PM

Hey everyone! Is there a way in Kedro Viz to hide datasets by default similarly to how parameters are hidden? In large pipelines this will declutter a lot the visualization

Ricardo Araújo

03/10/2023, 7:21 PM

Hey y'all. A tough one, for me at least: say my data is a monthly time series. I want to train one model per month. I can do it easily with a for loop, but that won't allow me to run in parallel. Is there a kedro-esque way to do this using maybe modular pipelines? I think I know how to do it if there was a fixed number of months, but that is not the case.

ed johnson

03/10/2023, 9:47 PM

Hello, what is the recommended way to run multiple pipelines sequentially? An obvious approach is just with multiple

kedro run --pipeline <pipeline_i>

commands defined inside a shell script, but i'm wondering if there is a better way perhaps using the run config.yml capability?

Andrew Stewart

03/10/2023, 11:45 PM

Anyone interested in testing an

AthenaDataSet

dataset (or even just code reviewing) ?

Sebastian Cardona Lozano

03/11/2023, 12:26 AM

Hi all. I'm trying to set up the version option for a SparkDataSet in the Catalog, but I got the next error when the node tries to save the dataset as .parquet file in Google Cloud Storage:

Copy code

VersionNotFoundError: Did not find any versions for SparkDataSet(file_format=parquet, 
filepath=<gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.pa>
rquet, load_args={'header': True, 'inferSchema': True}, save_args={}, version=Version(load=None, 
save='2023-03-10T23.44.07.085Z'))

In the catalog.yml I have this:

Copy code

master_model_input:
    type: spark.SparkDataSet
    filepath: <gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.parquet> #<gs://uri> de cloud storage
    file_format: parquet
    layer: model_input
    versioned: True
    load_args:
        header: True
        inferSchema: True

However, the parquet file is generated correctly in GCS (see the image attached). Thanks for your help! 🙂

Sebastian Cardona Lozano

03/11/2023, 1:09 AM

Hi again. If I want to save a ML model build with PySpark/MLlib, which file type do I have to use in the Catalog.yml? Thanks! 🙂

rss

03/12/2023, 12:58 AM

Colorful notebook output with rich library May someone tell me how to set again that colorful output from jupyter notebook without using rich.print? I use VSCode. I've got this feature with kedro=0.18.4 and lost with kedro=0.18.5. Kedro requires rich as an dependency.

https://i.stack.imgur.com/KeeZJ.png▾

I think it was a rich's bug, because after update dependencies in my project I lost this feature. ;) Previous similar topic with this bug: <a...

Rebecca Solcia

03/13/2023, 12:03 PM

Good morning! Quick question. I have saved a dataset with the following configurations

Copy code

05_07_FocusDatasource_PKL:
  type: kedro.extras.datasets.pickle.PickleDataSet
  filepath: data/02_intermediate/05_07_FocusDatasource.pkl

But when I call

catalog.load('05_07_FocusDatasource_PKL')

it tells me that it is a function

Copy code

<function focus_pickle at 0x7fbff82af040>

Any suggestions on how I can load that dataset?

Shubham Agrawal

03/13/2023, 2:31 PM

Hi! I have a Kedro pipeline which I want to obfuscate or convert to a wheel file and then deploy in a cluster? Does any one know if it is possible or are there any tools that could help me do that?

Robertqs

03/14/2023, 6:30 AM

Hi guys, is that possible to share data catalog across multiple projects? Or expose it through APIs? What I m trying to achieve is like a data metadata store for our team, so people can query and get information about datasets. Thanks.

Michal Szlupowicz

03/14/2023, 11:57 AM

Hi guys. Im trying to load data from SnowFlake to SparkDataSet using data catalog. We thought that use of SparkJDBCDataSet is the proper way of doing that but I struggle to set up connection drivers. Could someone advise me how ot set it up or suggest other solution?

Jan

03/14/2023, 1:09 PM

Hi! I am using kedro locally on my machine and was wondering to implement this example for hooks to monitor the execution time. Is it possible to install grafana locally as well then? How was this dashboard created? Is there a quick way to setup grafana and get this dashboard or will I have to deep dive into how grafana works?

Ana Man

03/14/2023, 4:20 PM

Hi everyone! I have a scenario and i wanted to see how people resolve this in their projects: Lets say you have a modular pipeline package that has a pipeline with 9 nodes (called pipe1). you want to amend the functionality of this pipeline to accommodate two conditions. Condition 1 relies on the pipeline as it is. Condition 2 requires a small change : an addition of 2 nodes in the pipeline. What would be the best practice way to extend this pipeline (ensuring backward compatibility)?

Dharmesh Soni

03/14/2023, 5:25 PM

Hi everyone! There are zip files having data in text files stored on the cloud. Is there any native Kedro or PySpark solution to read these zip and eventually text files? Structure of zip files:

Copy code

├── main_folder.zip
│   ├── folder1
│   │   └── text_file.txt
│   └── text_file.txt

Walber Moreira

03/14/2023, 8:11 PM

Guys, it’s possible to use jinja tem playing and globals param together? Like: {% for country in “${countries}” %} ?

Tom C

03/15/2023, 6:29 AM

Has anyone had issues with corrupted session_dbs for experiment tracking? My kedro runs are encoding some of what's dumped into the

runs

table as a string instead of nested json. This is causing a error when attempting to visualise the runs in viz. I've created a ticket, but want to ask here for people who don't follow the issue boards.

Jonas Kemper

03/15/2023, 12:03 PM

Hi friends, when I have a parameters.yml

Copy code

data_science:
  active_modelling_pipeline:
    model_options:
      test_size: 0.2...

and I load it via

Copy code

conf_loader = kedro.config.ConfigLoader(".")
parameters = conf_loader['parameters']

that returns me

Copy code

{'data_science': {'active_modelling_pipeline': {'model_options': {'test_size': 0.2,

. When in another place, I

Copy code

data_catalog = DataCatalog.from_config(catalog, credentials)
data_catalog.add_feed_dict(parameters)

this won't work, because eventually that'll land me

Copy code

ValueError: Pipeline input(s) {'params:data_science.candidate_modelling_pipeline.model_options.random_state', ...} not found in the DataCatalog

What's the intermediate step that I'm missing?