Kedro #questions

Simon Wolf

03/24/2023, 9:27 AM

Hi, I am using kedro with jupyter notebooks in vs-code. I load the kedro.ipython extension and it works fine. But for the catalog or session variables I get the "catalog" is not defined error from Pylance. How can I prevent Pylance from giving me this warning?

Tomás Rojas

03/24/2023, 1:33 PM

Hi, does anyone know how to skip one pipeline when running

kedro run

? I have a model training pipeline which I want to skip since I have a model already working

Julien Witty

03/24/2023, 4:00 PM

Hey, I hope you are doing well. Our team is evaluating to use kedro as a framework. I was wondering if some of you have some experience with kedro integration with huggingface library. I was wondering how you are handleling models after training. My first reflex was to implement a custom dataset for this usecase not sure how much effort it is though Then maybe use a string as the model path (not very clean) for next node.

Tomás Rojas

03/26/2023, 9:45 PM

Hi team, how can I completely remove mlflow from a project safely?

Maxime Steinmetz

03/27/2023, 1:06 PM

Does Kedro natively support parameters versioning? I’d like to know which parameters were used in a run for reproducibility

Sanjeev

03/27/2023, 4:09 PM

Hi Team , anyone worked on invoking kedro pipeline , asynchrnously from flask api?

Dotun O

03/27/2023, 4:58 PM

Hi team, is there a command to see the list of unrun kedro nodes (after a pipeline code fails) ? For now I get a run with --from-nodes "" as part of my error statement but look like to access the actual list. Thanks cc @datajoely

Alfonso Licir

03/27/2023, 10:17 PM

Hi team, how have you been? I have been trying to use the snowflake dataset on kedro-datasets 1.2.0, but I get an error for dependencies, I already run pip install "kedro-datasets[snowflake]" but I get the message: WARNING: kedro-datasets 1.2.0 does not provide the extra 'snowflake'

Alfonso Licir

03/27/2023, 10:17 PM

What

Alfonso Licir

03/27/2023, 10:17 PM

What I should review?

Sergei Benkovich

03/28/2023, 8:29 AM

we are discussing in the team kedro integration and several questions rose, would appreciate any guidance :) • given we have a flow but only what to run the parts where the configuration/params changes is it possible? to avoid running all processes. i know i can run select pipelines, but thats not what i’m looking for. • mlflow w&b integration? ◦ i see this package , is it the way to go or any other native way? • data versioning/ model versioning, is it by using mlflow or any other option exists? ◦ how mature is the experiment tracking in kedro and what is planned for the future?

Alexandre Ouellet

03/28/2023, 2:09 PM

Hey there! Has anyone ever tried training a yolo with Kedro? I struggle a bit with it as yolo requires a path to its dataset folder, as it handles all of the opening of files through a pytorch dataloader. Is there a way in Kedro to handle "folder" as a dataset, and leave it as a folder?

Ziren Lin

03/28/2023, 3:44 PM

Hi team, do we support customized input from parameters into the SQL query/file? I tried the following the codes file but Kedro couldn't read the parameters and input into sql query.

Copy code

#globals.yml
order_number: 'abc'

Copy code

#catalog.yml
sql:
  type: pandas.SQLQueryDataset
  sql: "SELECT * FROM table WHERE column = ${order_number}"

Filip Panovski

03/29/2023, 8:46 AM

Hi everyone. I have an issue Kedro 0.18.4 issues with transcoded datasets that I don't quite understand:

Copy code

ValueError: The following datasets are used with transcoding, but were referenced without the separator: typed_invoices
Please specify a transcoding option or rename the datasets.

Details within thread.

✅ 1

Christianne Rio Ortega

03/29/2023, 10:02 AM

Hello there! is there a limitation in terms of the naming convention in 'src' folder structure? I was planning to organize some of my nodes based on the order of execution/flow. e.g. src L engine L nodes L 000-raw L 100-cleanse L 200-refine can kedro parse ###-AAA as a prefix?

Ana Man

03/29/2023, 10:31 AM

~~Hi Everyone, Is there updated docs (using kedro 0.18.+) on building kedro pipelines with pyspark?~~

03/29/2023, 2:04 PM

I am using Kedro Viz 6.0.0 and I notice that some of the functions and metrics are grayed out. They also do not show up in the graph visualization. The missing metrics however can be seen in the experiment tracking tab.

Zoran

03/29/2023, 6:10 PM

Hi there, is this possible or i doing something wrong?

Miguel Angel Ortiz Marin

03/29/2023, 8:10 PM

Hi team! Wondering about pinning the load version of an specific dataset using a conf.yml file The docs include a reference to this, but the linked example doesn't show how to achieve this:

Miguel Angel Ortiz Marin

03/29/2023, 8:12 PM

Juan Luis

03/30/2023, 8:40 AM

hi folks, I created a custom dataset to see if I could understand the documentation and how it works, but I feel I'm doing some unconventional things and I'd need some advice:

Nok Lam Chan

03/30/2023, 9:09 AM

https://docs.kedro.org/en/latest/kedro.datasets.html I can’t find the documentation for the snowflake dataset, is there some thing going wrong here?

Iñigo Hidalgo

03/30/2023, 9:58 AM

Hello! I'm looking into ways to add data validation to our pipelines at runtime and came across this really cool example project using great expectations by (I assume) @Erwin https://github.com/erwinpaillacan/kedro-great-expectations-example It seems like a good way forward, using the hooks to run the validations if the dataset has some validations mapped to it in config, but was wondering if anybody has done it a different way, by treating the great expectations outputs as kedro datasets themselves. I ask this bc we have all our blob connectors implemented as kedro custom datasets, and the easiest way for us to save these validations would be by treating them as outputs from kedro nodes. I'm not interested in the html report output, I'm only interested in the json outputs as we would then send want to send alerts based on those.

👀 1

👍 1

Andreas Zeitler

03/30/2023, 11:47 AM

Hi guys! I'm currently having trouble with the execution order of catalog.save after a node ran, and the after_node_run hook, which I use to trigger mlflow documentation. node:

Copy code

node(
modeloutput.predict,
            inputs=["estimator", "modelinput_x_" + name],
            outputs="modeloutput_" + name,
            tags=["output", "output_" + name] + tags,
            name="predict_" + name
)

The output is configured in the data catalog. hook after_node_run_run:

Copy code

if node.name == 'predict_dach_testsplit_test':
                
                //This is the output of the node:
                y_pred_prob_comb_test = catalog.load('modeloutput_dach_testsplit_test')
                
               [..]

In the logs, it seems like kedro tries to load the data in the hook, before is was written by the catalog. Is this possible and is it meant to act like that? Could be fixed by using "outputs['modeloutput_dach_testsplit_test'] instead of catalog.load., but in my understanding it should not be necessary. Thanks in advance!

Priyanka Patil

03/30/2023, 12:08 PM

Hi team!! I’m trying to convert a fairly large spark dataframe to pandas. This is a super expensive operation, so I’m trying to convert chunks of spark dataframe to pandas dataframes. Is there a kedro dataset that allows us to save multiple pandas dataframes to csv part files? thanks!

Nikola Shahpazov

03/30/2023, 12:28 PM

Hi guys, Quick question. The standard tutorial for deploying kedro nodes to aws step functions uses aws_ecr. Is there a way to use a docker_hub image?

Andrej Zachar

03/30/2023, 6:10 PM

Hey everyone, I'd like to know if I can pass a "context" to my Kedro node function, which would allow me to access the names of input variables. For instance, if I'm generating a PDF report, I'd like to be able to identify the classifier and input versions I used. Is this possible? Thank you!

Massinissa Saïdi

03/31/2023, 12:29 PM

Hello, is it possible to save a sklearn pipeline object in pickle because I have this error :

Copy code

DataSetError: <class 'sklearn.pipeline.Pipeline'> was not serialised due to: Can't pickle local object 'fit_best_model.<locals>.<lambda>'

I just return a partitioned pickle dataset like that

return {'model_' + parameters['model']: pipeline}

and I define the dataset in catalog.yml like that

Copy code

models_partionned:
  type: PartitionedDataSet
  path: data/06_models/${date}/${target}/
  filename_suffix: ".pkl"
  dataset:
    type: pickle.PickleDataSet

Olivier Ho

03/31/2023, 1:14 PM

hello, two small questions: • any reason on why a

PartitionedDataSet

return a dictionary of callable that enable lazy loading and

IncrementalDataSet

which inherit for

PartitionedDataSet

return a dictionary of the content? • how does the

IncrementalDataSet

work if you use it as an input of node? I do not see the call to the

confirm

so I don't understand when is the checkpoint created

Sebastian Cardona Lozano

04/01/2023, 2:31 PM

Hi everyone. Has anyone used Kedro with Tensorflow? If yes, how was your experience? or maybe is better to use another framework like TFX? Thanks!