Kedro #questions

user

01/12/2023, 2:49 PM

kedro jupyter notebook in command prompt returns kedro.framework.cli.jupyter.single kernelspec manager' could not be imported" I have been trying to activate jupyter notebooks in a kedro context for over 24 hours now and I receive the same error all the time. I have searched around and no one seems to be able to solve this problem. I have created a jupyter_notebook_config.json as recommended by some and deleted it as recommended by others and there is no change. I have installed ipython and run $python3 -m ipykernel install --user --name=myvenv this successfully installed kernelspec within my venv but still when i...

Afaque Ahmad

01/13/2023, 7:03 AM

Hi Team I'm trying to run Kedro on AWS Managed Airflow. I've used the

kedro-airflow

plugin to generate the dags. Is there a guide I can follow for step by step process to get the dag up and running on Airflow? Do I need to put the

.whl

file anywhere after running

kedro package

user

01/13/2023, 1:28 PM

Python: kedro viz SQLAlchemy DeprecationWarning I tried to work with kedro and started with the spaceflight tutorial. I installed the src/requirements.txt in a .venv. When running kedro viz (or kedro run or even kedro --version), I get lets of Deprecation Warnings. One of which is the following (relating to kedro viz) kedro_viz\models\experiment_tracking.py16 MovedIn20Warning: [31mDeprecated API features warnings.py:109 detected! These feature(s) are not compatible with SQLAlchemy 2.0. [32mTo prevent...

Simen Husøy

01/15/2023, 3:02 PM

Hi, I want to use Kedro viz to visualize images made in a kedro pipeline. The examples I've seen so far show how to use the

plotly.PlotlyDataSet

to make bar plots etc., but I am having a hard time figuring out how to plot a image similar to how you do it with

plt.imshow(...)

in kedro viz. Anyone here who has knowledge of how to do this?

Dustin

01/17/2023, 1:49 AM

hi team, just a quick question. There is one step in my existing pipeline (aiming to migrate to Kedro) that will convert pandas to Huggingface Dataset in order to call Huggingface trainer

Dustin

01/17/2023, 1:50 AM

wondering any support for Dataset from the Kedro catalog perspective? how to define the output then if catalog doesn't support this dataformat

Dustin

01/17/2023, 1:51 AM

Gaetan

01/17/2023, 10:43 AM

Hello, I'm evaluating Kedro for my company, it is currently one of the closest to what we need. But I have a question about something very common in our workflow and i'm not sure how we would implement it in Kedro. Some of our pipelines start with someting like this - Download a dataset (between 20 and 100GB) - Create a local index in a temporary folder of the data (with lucene for example) using bash command - Use the index to extract a dataset using bash command - Remove the temporary local index - Use the dataset in subsequent steps (after that step kedro seems to handle our needs) It is similar to that kind of thing in some way https://docs.dagster.io/tutorial/assets/non-argument-deps To summarize - Doing operations outside the graph by using local filesystem - Another thing, instead of loading the data in memory and let Kedro serialize it to store it on S3 for example, being able to give it a local path where data is stored, and let kedro pick the local path to upload it to S3 Thanks!

user

01/17/2023, 10:58 AM

Does kedro data catalog accept .arrow files? While using Kedro I want to load some data and work with it. To do that, one has to register the data in a conf/base/catalog.yml file. The Kedro Documentation of the Data Catalog explains how one can register data for Kedro to load. However, there is little to no information on how to load a <a...

Simen Husøy

01/17/2023, 12:23 PM

I have one more question for you guys. I have a pipeline,

pipeline1

, that uses a dataset

as data input. This dataset is a custom dataset class that downloads a set of data from a REST-api we have. Multiple nodes use

as input. I want to make a test pipeline that wraps

pipeline1

by loading a different dataset (still from a REST-api, but with different query parameters) together with additional test nodes that runs performance metrics on the results from

pipeline1

. I have implemented this by using the override functionality of pipeline by wrapping

pipeline1

in a new pipeline function and giving it a override dictionary to use the test dataset instead of the original dataset,

inputs={x: test_x}

. This seems to work, but I register that it downloads the data multiple times, which is not preferable since it takes some time to download the dataset from the api each time. It seems like each node that uses

pipeline1

each downloads(loads) the dataset instead of it being loaded one time for the whole test pipeline. Do know how to prevent the dataset from being loaded for each node? (code in the comments)

Miguel Angel Ortiz Marin

01/17/2023, 3:24 PM

Hi team, wondering about some pointers for working with jinja2 templating. Facing the following pain point: • We're importing .j2 files that keep macros and some variables, however we can only import j2 files that are in the same folder or in subfolders: ◦ I Can do {% from "./countries.j2" import countries %} with no problem ◦ Can't do {% from "../countries.j2" import countries %} which ends giving an error • Ideally I'd keep a "global" templates folder from which macros and variables can be imported • Not sure if this is directly a kedro question. Wondering if some subclassing to TemplatedConfigLoader could do the trick

Linda Sun

01/17/2023, 9:42 PM

Hi Kedro team, I’ve used kedro in my project. In terms of data catalog, I have snowflake data which needs to read in/ write from spark dataset. I implemented this function of snowflake connector using extraDataset. Just wondering if there is a need for Kedro codebase, so that I can help to contribute on this part? Thank you.

Vici

01/18/2023, 9:51 AM

Hi everyone. Due to many "test runs" in order to see how well plots turn out and the like, I've accumulated a huge number of irrelevant runs in my experiment tracking panel. This makes it much more painful to use. Is there a way to: 1. Delete experiment runs 2. Turn off experiment tracking for an instance of "kedro run", e.g. via some command line argument that I might have missed? This question is kind of related to Reason 9 from this github issue. But I don't know whether a fix exists by now... Thank you!

Damian Fiłonowicz

01/18/2023, 10:14 AM

Hey, I have a quick kedro-viz question. When I try to deploy a static, updated kedro-viz of the pipeline on the machine along with the project's API, I get pip dependency conflicts with fastapi and uvicorn cuz kedro-viz requires older versions:

Copy code

my app requires fastapi==0.81.0, but you have fastapi 0.66.1 which is incompatible.
my app requires uvicorn[standard]==0.18.3, but you have uvicorn 0.17.6 which is incompatible.

I also see that the kedro-static-viz plugin is dead for like 2 years already: https://github.com/WaylonWalker/kedro-static-viz Hence, what is an advised way of deploying this viz with the latest versions? Does anybody use it in a small container, provides it with project's code and/or the json file, and starts it with --load-file FILE args? If not, is there any nice solution to this? 🙂

Vaibhav

01/18/2023, 11:15 AM

Hi, Is it possible to raise / remove the ceiling for pyarrow, it is currently pinned to <7.0 and we wanted to use kedro with some libraries which needs pyarrow 8. Thank you!

Simen Husøy

01/18/2023, 3:11 PM

Hi, after upgrading to kedro-viz 5.2.0 i get the following error:

Copy code

kedro.framework.cli.utils.KedroCliError: not enough values to unpack (expected 3, got 1)
Run with --verbose to see the full exception
Error: not enough values to unpack (expected 3, got 1)

Worked with the previous version, anyone knows why this happens? (full stack trace in comments)

João Areias

01/18/2023, 7:10 PM

Hi, I was wondering if anyone has used Kedro with Quarto notebooks (https://quarto.org/) ? They are similar to R markdown. Does any of you know if they work together?

William Caicedo

01/19/2023, 4:49 AM

Is anybody aware of any issues with the

reload_kedro

magic and Kedro 0.18.4?

datajoely

01/19/2023, 8:55 AM

alert Also apologies everyone we’re not sure why these Kotlin questions have come through the RSS we’re pointing to should just be this: https://stackoverflow.com/feeds/tag/kedro

Afaque Ahmad

01/19/2023, 9:10 AM

Hi Kedro Folks. I'm trying to create a

LivyRunner

to be able to submit jobs to an

EMR

cluster using

Livy

. I'm using

Kedro

0.18.4

. I need to pass the code as a string to

Livy

. Has anyone created something similar. Any help is really appreciated. I'm trying to pass the code in

_run

Livy

. How to figure our which pipeline and node to run? We do have the following parameters in the

_run

function but it cannot be passed to the string.

Copy code

def _run(
        self,
        pipeline: Pipeline,
        catalog: DataCatalog,
        hook_manager: PluginManager,
        session_id: str = None,
    ) -> None:

Iñigo Hidalgo

01/19/2023, 9:17 AM

Hey all, simple question: is it possible to pass both positional arguments as well as keyword arguments to a kedro node? My example usecase is the sklearn train_test_split function, which takes an arbitrary number of arrays passed positionally and then keyword arguments like

test_size

need to be passed by name. It would need to be a combination of passing an iterable as well as a dictionary to the

inputs

for the node, which as far as I know isn't doable. If not possible, how would you suggest I proceed, when my objective is to be able to feed in outputs from different nodes to converge into that function to then output into a train node.

Balazs Konig

01/19/2023, 10:55 AM

🦜 Hi Team! 🦜 QQ about running K pipelines in Jenkins CI. We have pipelines with fabricated data that use the same nodes as pipelines with real data, and it would already be a great integration test to run all our fabricated pipelines after unit tests in our CI. Are these case studies / examples for how to do this, eg. how to handle the pipeline output. Also, do we need to remove the fabricated pipeline output from the catalog to keep it a MemoryDataSet for CI to access if we don't want to write to disk every time CI runs? Thanks! 🙏

Juan Marin

01/19/2023, 12:32 PM

Hey folks! Just started using kedro. Is there any

kedro

command to import datasets from a path into my data directory in the project? Thanks!

👀 1

Simen Husøy

01/19/2023, 2:08 PM

Do anyone know if the neptune-kedro package is workin atm. for kedro? Have tried it, but aren't able to get it to log plots. It reports this at the end without any progress:

Copy code

Waiting for the remaining 582 operations to synchronize with Neptune. Do not kill this process.
Still waiting for the remaining 582 operations (0.00% done). Please wait.

Brandon Meek

01/19/2023, 8:05 PM

Hey all, so by default running

kedro run

will load the configurations from

conf/base

and then overwrite it with

conf/local

and you can use the

--env

argument to use a different environment instead of

conf/local

But I was wondering if there was a way to use the

--env

argument to waterfall instead of just overwrite? So if you ran

kedro run --env=dev

it would go

conf/base

conf/dev

conf/local

Dustin

01/20/2023, 4:05 AM

Copy code

Hi team, I would like to discuss a feature idea (or this is already implemented?) to seek your thought :)

Context:
It is common in practice to know the consuming time of the whole pipeline and the consuming time of each node in the pipeline.

I assume the stakeholder/engineers would like to understand the performance of pipline and which part can be optimized. 

Features:
1. Is it possible to show consuming time (in second/minutes/) of each node in the pipeline?
    1.1 by default, it is shown in the console as part of logging and you can configure to turn it off
2. Given feature 1, is it possible to show the consuming time of each pipeline?
    2.1 by default, it is shown in the console as part of logging at the end of each pipeline running
    2.2 in case there are multiple pipeline, show it for each pipeline, you can configure to turn it off
3. Given feature 2, is it possible to show the consuming time of all pipelines in total
    3.1 by default, it is shown in the console at the end of all pipelines running and you can configure

Dustin

01/20/2023, 4:08 AM

understood you can calculate them from the console log but would be handy to see it in a specific log xxx pipeline/node took xxxx seconds

Artur Dobrogowski

01/20/2023, 12:52 PM

Hello, I'm beginner in kedro and trying to get myself familiar with it. I've seen that in startup projects there's

setup.py

present in

src/

. I can't find info on documentation pages what is it used for? Is the kedro pipeline built as a python package for some portability features? I'd like to know what's going on if someone can shed some light here 🙂

Massinissa Saïdi

01/20/2023, 3:50 PM

Hello Kedro community, I have a question regarding the management of environment variables. Is there a way to use environment variables (ex: MYSQLUSER, MYSQLDB....) in kedro config files (credentials.yml, parameters.yml ...). Thank you very much 🙂

Raghav Gupta

01/21/2023, 7:22 PM

Hello kedro Team! Can we have use the same output for multiple nodes ? I have asynchronous kedro pipelines updating specific columns of the same dataset, at different frequencies. If not, any other approaches to consider ?