https://kedro.org/ logo
Join SlackCommunities
Powered by
# questions
  • u

    user

    01/12/2023, 2:49 PM
    kedro jupyter notebook in command prompt returns kedro.framework.cli.jupyter.single kernelspec manager' could not be imported" I have been trying to activate jupyter notebooks in a kedro context for over 24 hours now and I receive the same error all the time. I have searched around and no one seems to be able to solve this problem. I have created a jupyter_notebook_config.json as recommended by some and deleted it as recommended by others and there is no change. I have installed ipython and run $python3 -m ipykernel install --user --name=myvenv this successfully installed kernelspec within my venv but still when i...
  • a

    Afaque Ahmad

    01/13/2023, 7:03 AM
    Hi Team I'm trying to run Kedro on AWS Managed Airflow. I've used the
    kedro-airflow
    plugin to generate the dags. Is there a guide I can follow for step by step process to get the dag up and running on Airflow? Do I need to put the
    .whl
    file anywhere after running
    kedro package
    ?
  • u

    user

    01/13/2023, 1:28 PM
    Python: kedro viz SQLAlchemy DeprecationWarning I tried to work with kedro and started with the spaceflight tutorial. I installed the src/requirements.txt in a .venv. When running kedro viz (or kedro run or even kedro --version), I get lets of Deprecation Warnings. One of which is the following (relating to kedro viz) kedro_viz\models\experiment_tracking.py16 MovedIn20Warning: [31mDeprecated API features warnings.py:109 detected! These feature(s) are not compatible with SQLAlchemy 2.0. [32mTo prevent...
  • s

    Simen Husøy

    01/15/2023, 3:02 PM
    Hi, I want to use Kedro viz to visualize images made in a kedro pipeline. The examples I've seen so far show how to use the
    plotly.PlotlyDataSet
    to make bar plots etc., but I am having a hard time figuring out how to plot a image similar to how you do it with
    plt.imshow(...)
    in kedro viz. Anyone here who has knowledge of how to do this?
    j
    a
    +3
    • 6
    • 23
  • d

    Dustin

    01/17/2023, 1:49 AM
    hi team, just a quick question. There is one step in my existing pipeline (aiming to migrate to Kedro) that will convert pandas to Huggingface Dataset in order to call Huggingface trainer
  • d

    Dustin

    01/17/2023, 1:50 AM
    wondering any support for Dataset from the Kedro catalog perspective? how to define the output then if catalog doesn't support this dataformat
  • d

    Dustin

    01/17/2023, 1:51 AM
    p
    d
    • 3
    • 2
  • g

    Gaetan

    01/17/2023, 10:43 AM
    Hello, I'm evaluating Kedro for my company, it is currently one of the closest to what we need. But I have a question about something very common in our workflow and i'm not sure how we would implement it in Kedro. Some of our pipelines start with someting like this - Download a dataset (between 20 and 100GB) - Create a local index in a temporary folder of the data (with lucene for example) using bash command - Use the index to extract a dataset using bash command - Remove the temporary local index - Use the dataset in subsequent steps (after that step kedro seems to handle our needs) It is similar to that kind of thing in some way https://docs.dagster.io/tutorial/assets/non-argument-deps To summarize - Doing operations outside the graph by using local filesystem - Another thing, instead of loading the data in memory and let Kedro serialize it to store it on S3 for example, being able to give it a local path where data is stored, and let kedro pick the local path to upload it to S3 Thanks!
  • u

    user

    01/17/2023, 10:58 AM
    Does kedro data catalog accept .arrow files? While using Kedro I want to load some data and work with it. To do that, one has to register the data in a conf/base/catalog.yml file. The Kedro Documentation of the Data Catalog explains how one can register data for Kedro to load. However, there is little to no information on how to load a <a...
  • s

    Simen Husøy

    01/17/2023, 12:23 PM
    I have one more question for you guys. I have a pipeline,
    pipeline1
    , that uses a dataset
    x
    as data input. This dataset is a custom dataset class that downloads a set of data from a REST-api we have. Multiple nodes use
    x
    as input. I want to make a test pipeline that wraps
    pipeline1
    by loading a different dataset (still from a REST-api, but with different query parameters) together with additional test nodes that runs performance metrics on the results from
    pipeline1
    . I have implemented this by using the override functionality of pipeline by wrapping
    pipeline1
    in a new pipeline function and giving it a override dictionary to use the test dataset instead of the original dataset,
    inputs={x: test_x}
    . This seems to work, but I register that it downloads the data multiple times, which is not preferable since it takes some time to download the dataset from the api each time. It seems like each node that uses
    x
    in
    pipeline1
    each downloads(loads) the dataset instead of it being loaded one time for the whole test pipeline. Do know how to prevent the dataset from being loaded for each node? (code in the comments)
    d
    • 2
    • 9
  • m

    Miguel Angel Ortiz Marin

    01/17/2023, 3:24 PM
    Hi team, wondering about some pointers for working with jinja2 templating. Facing the following pain point: • We're importing .j2 files that keep macros and some variables, however we can only import j2 files that are in the same folder or in subfolders: ◦ I Can do {% from "./countries.j2" import countries %} with no problem ◦ Can't do {% from "../countries.j2" import countries %} which ends giving an error • Ideally I'd keep a "global" templates folder from which macros and variables can be imported • Not sure if this is directly a kedro question. Wondering if some subclassing to TemplatedConfigLoader could do the trick
    d
    • 2
    • 2
  • l

    Linda Sun

    01/17/2023, 9:42 PM
    Hi Kedro team, I’ve used kedro in my project. In terms of data catalog, I have snowflake data which needs to read in/ write from spark dataset. I implemented this function of snowflake connector using extraDataset. Just wondering if there is a need for Kedro codebase, so that I can help to contribute on this part? Thank you.
    d
    y
    • 3
    • 2
  • v

    Vici

    01/18/2023, 9:51 AM
    Hi everyone. Due to many "test runs" in order to see how well plots turn out and the like, I've accumulated a huge number of irrelevant runs in my experiment tracking panel. This makes it much more painful to use. Is there a way to: 1. Delete experiment runs 2. Turn off experiment tracking for an instance of "kedro run", e.g. via some command line argument that I might have missed? This question is kind of related to Reason 9 from this github issue. But I don't know whether a fix exists by now... Thank you!
    t
    y
    a
    • 4
    • 21
  • d

    Damian Fiłonowicz

    01/18/2023, 10:14 AM
    Hey, I have a quick kedro-viz question. When I try to deploy a static, updated kedro-viz of the pipeline on the machine along with the project's API, I get pip dependency conflicts with fastapi and uvicorn cuz kedro-viz requires older versions:
    Copy code
    my app requires fastapi==0.81.0, but you have fastapi 0.66.1 which is incompatible.
    my app requires uvicorn[standard]==0.18.3, but you have uvicorn 0.17.6 which is incompatible.
    I also see that the kedro-static-viz plugin is dead for like 2 years already: https://github.com/WaylonWalker/kedro-static-viz Hence, what is an advised way of deploying this viz with the latest versions? Does anybody use it in a small container, provides it with project's code and/or the json file, and starts it with --load-file FILE args? If not, is there any nice solution to this? 🙂
  • v

    Vaibhav

    01/18/2023, 11:15 AM
    Hi, Is it possible to raise / remove the ceiling for pyarrow, it is currently pinned to <7.0 and we wanted to use kedro with some libraries which needs pyarrow 8. Thank you!
    d
    m
    • 3
    • 15
  • s

    Simen Husøy

    01/18/2023, 3:11 PM
    Hi, after upgrading to kedro-viz 5.2.0 i get the following error:
    Copy code
    kedro.framework.cli.utils.KedroCliError: not enough values to unpack (expected 3, got 1)
    Run with --verbose to see the full exception
    Error: not enough values to unpack (expected 3, got 1)
    Worked with the previous version, anyone knows why this happens? (full stack trace in comments)
    d
    a
    +4
    • 7
    • 35
  • j

    João Areias

    01/18/2023, 7:10 PM
    Hi, I was wondering if anyone has used Kedro with Quarto notebooks (https://quarto.org/) ? They are similar to R markdown. Does any of you know if they work together?
  • w

    William Caicedo

    01/19/2023, 4:49 AM
    Is anybody aware of any issues with the
    reload_kedro
    magic and Kedro 0.18.4?
    d
    • 2
    • 2
  • d

    datajoely

    01/19/2023, 8:55 AM
    alert Also apologies everyone we’re not sure why these Kotlin questions have come through the RSS we’re pointing to should just be this: https://stackoverflow.com/feeds/tag/kedro
  • a

    Afaque Ahmad

    01/19/2023, 9:10 AM
    Hi Kedro Folks. I'm trying to create a
    LivyRunner
    to be able to submit jobs to an
    EMR
    cluster using
    Livy
    . I'm using
    Kedro
    0.18.4
    . I need to pass the code as a string to
    Livy
    . Has anyone created something similar. Any help is really appreciated. I'm trying to pass the code in
    _run
    to
    Livy
    . How to figure our which pipeline and node to run? We do have the following parameters in the
    _run
    function but it cannot be passed to the string.
    Copy code
    def _run(
            self,
            pipeline: Pipeline,
            catalog: DataCatalog,
            hook_manager: PluginManager,
            session_id: str = None,
        ) -> None:
    d
    • 2
    • 4
  • i

    Iñigo Hidalgo

    01/19/2023, 9:17 AM
    Hey all, simple question: is it possible to pass both positional arguments as well as keyword arguments to a kedro node? My example usecase is the sklearn train_test_split function, which takes an arbitrary number of arrays passed positionally and then keyword arguments like
    test_size
    need to be passed by name. It would need to be a combination of passing an iterable as well as a dictionary to the
    inputs
    for the node, which as far as I know isn't doable. If not possible, how would you suggest I proceed, when my objective is to be able to feed in outputs from different nodes to converge into that function to then output into a train node.
    d
    b
    • 3
    • 19
  • b

    Balazs Konig

    01/19/2023, 10:55 AM
    🦜 Hi Team! 🦜 QQ about running K pipelines in Jenkins CI. We have pipelines with fabricated data that use the same nodes as pipelines with real data, and it would already be a great integration test to run all our fabricated pipelines after unit tests in our CI. Are these case studies / examples for how to do this, eg. how to handle the pipeline output. Also, do we need to remove the fabricated pipeline output from the catalog to keep it a MemoryDataSet for CI to access if we don't want to write to disk every time CI runs? Thanks! 🙏
    d
    d
    • 3
    • 6
  • j

    Juan Marin

    01/19/2023, 12:32 PM
    Hey folks! Just started using kedro. Is there any
    kedro
    command to import datasets from a path into my data directory in the project? Thanks!
    👀 1
    d
    j
    • 3
    • 5
  • s

    Simen Husøy

    01/19/2023, 2:08 PM
    Do anyone know if the neptune-kedro package is workin atm. for kedro? Have tried it, but aren't able to get it to log plots. It reports this at the end without any progress:
    Copy code
    Waiting for the remaining 582 operations to synchronize with Neptune. Do not kill this process.
    Still waiting for the remaining 582 operations (0.00% done). Please wait.
    d
    • 2
    • 8
  • b

    Brandon Meek

    01/19/2023, 8:05 PM
    Hey all, so by default running
    kedro run
    will load the configurations from
    conf/base
    and then overwrite it with
    conf/local
    and you can use the
    --env
    argument to use a different environment instead of
    conf/local
    But I was wondering if there was a way to use the
    --env
    argument to waterfall instead of just overwrite? So if you ran
    kedro run --env=dev
    it would go
    conf/base
    ->
    conf/dev
    ->
    conf/local
    d
    • 2
    • 3
  • d

    Dustin

    01/20/2023, 4:05 AM
    Copy code
    Hi team, I would like to discuss a feature idea (or this is already implemented?) to seek your thought :)
    
    Context:
    It is common in practice to know the consuming time of the whole pipeline and the consuming time of each node in the pipeline.
    
    I assume the stakeholder/engineers would like to understand the performance of pipline and which part can be optimized. 
    
    Features:
    1. Is it possible to show consuming time (in second/minutes/) of each node in the pipeline?
        1.1 by default, it is shown in the console as part of logging and you can configure to turn it off
    2. Given feature 1, is it possible to show the consuming time of each pipeline?
        2.1 by default, it is shown in the console as part of logging at the end of each pipeline running
        2.2 in case there are multiple pipeline, show it for each pipeline, you can configure to turn it off
    3. Given feature 2, is it possible to show the consuming time of all pipelines in total
        3.1 by default, it is shown in the console at the end of all pipelines running and you can configure
    d
    j
    • 3
    • 4
  • d

    Dustin

    01/20/2023, 4:08 AM
    understood you can calculate them from the console log but would be handy to see it in a specific log xxx pipeline/node took xxxx seconds
  • a

    Artur Dobrogowski

    01/20/2023, 12:52 PM
    Hello, I'm beginner in kedro and trying to get myself familiar with it. I've seen that in startup projects there's
    setup.py
    present in
    src/
    . I can't find info on documentation pages what is it used for? Is the kedro pipeline built as a python package for some portability features? I'd like to know what's going on if someone can shed some light here 🙂
    d
    • 2
    • 13
  • m

    Massinissa Saïdi

    01/20/2023, 3:50 PM
    Hello Kedro community, I have a question regarding the management of environment variables. Is there a way to use environment variables (ex: MYSQLUSER, MYSQLDB....) in kedro config files (credentials.yml, parameters.yml ...). Thank you very much 🙂
    m
    d
    • 3
    • 10
  • r

    Raghav Gupta

    01/21/2023, 7:22 PM
    Hello kedro Team! Can we have use the same output for multiple nodes ? I have asynchronous kedro pipelines updating specific columns of the same dataset, at different frequencies. If not, any other approaches to consider ?
    d
    • 2
    • 1
1...91011...31Latest