https://kedro.org/ logo
Join Slack
Powered by
# questions
  • j

    Jo Stichbury

    12/01/2022, 11:14 AM
    Hi team! I have a pair of questions about the plotly chart visualisation of the spaceflights tutorial, described in the docs for 18.3 here. Context: I'm working on a revision of those docs to make it a bit more straightforward by adding a new pipeline for
    reporting
    . More context: I took the basic spaceflights starter as it will be after 18.4, which means I stripped out the namespaces/modular pipelines, so the example code is more straightforward. You can see the starter on the repo here (when we put out release 18.4 it'll be merged and available immediately for access via
    kedro new --starter=spaceflights
    ) but right now you'll need to use
    kedro new --starter=spaceflights --checkout=68a27db42335366b07f9362f677d69684ec4e942
    OK, so here's my example code with a reporting pipeline but when I
    kedro run
    and then
    kedro viz
    I see a different graphic to the one in the docs: TL;DR -- what are the questions? Q1: Is this viz correct? If it is not supposed to look like this, please roast my pipeline. Q2: I tried to save my visualisation with
    kedro viz --save-file my_shareable_pipeline.json
    but when I then reload it with
    kedro viz --load-file my_shareable_pipeline.json
    I don't see the chart. So question 2 is: what's wrong with my viz?
    Thanks in advance for any advice. LMK if you need more information.
    • 1
    • 1
  • s

    shawn

    12/01/2022, 3:34 PM
    Hey Everyone,
  • s

    shawn

    12/01/2022, 3:38 PM
    Context: I am trying to run a job in Databricks based off the .whl file packaged by Kedro: Kedro Version: 0.18.3 Error:
    ValueError: Given configuration path either does not exist or is not a valid directory: /databricks/driver/conf/base
    Q1: Is the issue on due to the .whl file itself or the way I am configuring the job ? Q2: Would this be due to a permissions issue on the environment I am using ?
    d
    n
    • 3
    • 5
  • s

    shawn

    12/01/2022, 3:38 PM
    Thank you so much for your help in advance!!
  • j

    Jan

    12/02/2022, 8:17 AM
    Hi! I'm trying to this run_only_missing Example. However, in the docs it says I need to supply a _hook_manager_. I setup hooks (even though I don't need them at the moment) but I don't know what exactly to supply to the
    run_only_missing
    as _hook_manager._ Can anyone assist? 🙂
    ✅ 1
    n
    • 2
    • 5
  • a

    Anu Arora

    12/02/2022, 3:58 PM
    Hi Team One qq, are you aware of any other better way to orchestrate kedro pipeline on databricks using ADF? The way I was doing so far was to orchestrate the notebook using ADF where the databricks notebook contains the code to unzip the wheel contents of kedro project -> install the libraries through requirements.txt -> then run the kedro pipeline
  • e

    Eugene P

    12/02/2022, 4:41 PM
    Hi kedroids! Sorry for noob question. I’m working with sql database as source of data and pandas.SQLQueryDataSet works well
    Copy code
    sample_sql_query_data:
      type: pandas.SQLQueryDataSet
      credentials: postgres_re_db
      sql: SELECT * FROM rr_norm.sample_gov_torgi
    Unfortunately, the amount of queries grows fast and catalog.yaml starts bloating with long query strings. Also, it looks like not a good idea to keep sql queries strings within the catalog.yaml itself for reproducibility. What would be the most kedroic/pythonic approach to extract queries from the catalog.yaml to a separate folder/module? AFAIK (or understood from googling) yaml doesn’t natively has include/import features?
    b
    • 2
    • 3
  • s

    shawn

    12/05/2022, 3:07 PM
    Hey everyone! I am getting the following error with kedro and I am not sure why conf/base is under site packages Context: I am trying to run a job in Databricks based off the .whl file packaged by Kedro: Kedro Version: 0.18.3 Error:
    ValueError: Given configuration path either does not exist or is not a valid directory: /databricks/driver/conf/base
    y
    • 2
    • 3
  • m

    marrrcin

    12/06/2022, 8:45 AM
    How is the release cycle of Kedro coordinated? Right now, kedro
    0.18.4
    is already in PyPI, but starters are not tagged yet, making our CI/CD pipelines fail:
    kedro.framework.cli.utils.KedroCliError: Kedro project template not found at git+https://github.com/kedro-org/kedro-starters.git. Specified tag 0.18.4. The following tags are available: 0.17.0, 0.17.1, 0.17.2, 0.17.3, 0.17.4, 0.17.5, 0.17.6, 0.17.7, 0.18.0, 0.18.1, 0.18.2, 0.18.3.
    Can we expect tagging today? 🤔 Maybe there should be some fallback mechanism for kedro starters to use versioning similar to Python (e.g.
    ~=0.18.0
    but for tags).
    m
    • 2
    • 5
  • y

    Yifan

    12/06/2022, 10:40 AM
    Hey everyone! I would like to know if there is a tool or Kedro module capable of profiling each node in a pipeline? Basically I want to analyse the execution time of each node (from loading the first input dataset to the end of saving the last chunk of output to the storage) in my pipeline, and I am aware of the possibility of using log files. However, for a pipeline with hundreds of nodes, manually analysing the log files is almost impossible. Do you have any suggestions? Thank you!
    d
    n
    • 3
    • 2
  • p

    Pallavi Kumari

    12/06/2022, 11:41 AM
    Hi everyone, I have to call Kedro nodes or pipelines in my Django project. eg: for simulation ,we need to call kedro pipeline and want to use its output as input of django apis. please suggest some solution for this.
    d
    • 2
    • 16
  • u

    user

    12/06/2022, 12:18 PM
    how to call kedro pipline or nodes in Django framework I have to call Kedro nodes or pipelines in Django API, and need to use Kedro pipelines or nodes output as input in Django API. Not getting any solution. Please suggest some solution for this. How to call Kedro pipeline or nodes in Django APIs?
    ✅ 1
  • f

    Fabian

    12/07/2022, 9:14 AM
    Hi Team, is it possible to save the same output in two different catalog entries? I want to save my data to a parquet file for further usage and as csv. Is that possible without modifying my nodes?
    m
    y
    • 3
    • 2
  • j

    Jan

    12/07/2022, 9:30 AM
    Hello! Is it possible to register a data catalog entry as a versioned file (versioned=True) via kedro.io.DataCatalog ? I only find information about how to do this in the yml file.
    s
    n
    • 3
    • 6
  • f

    Fabian

    12/07/2022, 12:06 PM
    Hello everyone, a kedro viz question: I have a modular pipeline with two outputs: first some intermediate data that is further processed within the pipeline, and the final data. I instantiate the pipeline with a namespace and added both data to the catalog. In kedro viz, only the final data is shown as output of my modular pipeline. The intermediate data is shown seperated without any connections. However, when i expand the modular pipeline, the intermediate data is shown as output of the specific node. I want the intermediate data to be shown as a result of my unexpanded modular pipeline, especially when using it as input for other pipelines. However, that is not the case. Is the observed case the intended behavior? And what could I do to change it?
    d
    • 2
    • 1
  • o

    Olivia Lihn

    12/07/2022, 12:29 PM
    hi everyone! I am running a kedro pipeline in Databricks Repo, using the kedro docs. The pipeline runs end-to-end but i encountered an error:
    Copy code
    OperationalError: (sqlite3.OperationalError) unable to open database file
    (Background on this error at: <https://sqlalche.me/e/14/e3q8>)
    My guess is that the run session info cannot be saved because of writing permissions on databricks Repo. We have deleted
    logging.yml
    and to be honest this is more of an annoying error (as the pipeline runs). Any ideas on how can we avoid this?
    d
    z
    • 3
    • 3
  • m

    Maurits

    12/07/2022, 5:29 PM
    Hi all, I'm facing a
    java.lang.OutOfMemoryError: Java heap space
    error storing a JSON-file of 2.5M rows on AWS S3 via a Kedro pipeline. ECS Compute has 104 GB memory already. Any recommendation how to configure this? Repartition experience? Spark config? Or work around it?
    d
    • 2
    • 1
  • o

    Olga Chumakova

    12/07/2022, 9:33 PM
    Hi all! Do Kedro nodes allow to have optional inputs and outputs? I have an evaluation function with in-time and out-of-time testing. However, I want to do both tests only for certain models and apply in-time for the rest. Do I need to build separate functions for these two scenarios or can I set out-of-time inputs/outputs as optional?
    🤔 1
    d
    • 2
    • 3
  • t

    Tooba Mukhtar

    12/07/2022, 9:53 PM
    Hi team, I am trying to set up the Layer functionality in kedro viz. Have defined all the layers in the yml files but 2 of the layers are not being displayed in Kedro Viz. I can see the nodes being displayed but they are being assigned to incorrect layers (example: Model output instead of reporting). What could be the reason for this?
    👍 1
    r
    • 2
    • 2
  • j

    Jaakko

    12/08/2022, 8:53 AM
    The documentation still instructs to use
    kedro build-reqs
    but when running
    kedro build-reqs
    I get the following deprecation warning:
    DeprecationWarning: Command 'kedro build-reqs' is deprecated and will not be available from Kedro 0.19.0.
    How should project dependencies be managed after
    build-reqs
    is not available anymore? Can the documentation be updated accordingly?
    m
    d
    • 3
    • 5
  • j

    Jo Stichbury

    12/08/2022, 10:39 AM
    Please could I get a bit of help with an issue that's been reported over on GitHub, but looks more like it's a question for here? I've directed the user to come over here for some further help but thought I'd highlight it now to get the ball rolling: https://github.com/kedro-org/kedro/issues/2104
    n
    • 2
    • 3
  • s

    Shreyas Nc

    12/08/2022, 1:09 PM
    Hi , I want to use the pillow.ImageDatSet. But getting an error. Pasting the changes here: The documentation doesnt have YAML API described too. Am I missing something?
    Copy code
    imageset:
      type: PartitionedDataSet
      dataset: {
          "type": pillow.ImageDataSet
      }
      path: <path_to_data>
      filename_suffix: ".jpg"
    
    getting below error:
    
    kedro.io.core.DataSetError:
    Object 'ImageDataSet' cannot be loaded from 'kedro.extras.datasets.pillow'. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pillow.ImageDataSet:
    <https://kedro.readthedocs.io/en/stable/kedro_project_setup/dependencies.html>.
    Failed to instantiate DataSet 'imageset' of type 'kedro.io.partitioned_dataset.PartitionedDataSet'.
    kedro.framework.cli.utils.KedroCliError:
    Object 'ImageDataSet' cannot be loaded from 'kedro.extras.datasets.pillow'. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pillow.ImageDataSet:
    <https://kedro.readthedocs.io/en/stable/kedro_project_setup/dependencies.html>.
    Failed to instantiate DataSet 'imageset' of type 'kedro.io.partitioned_dataset.PartitionedDataSet'.
    Run with --verbose to see the full exception
    Error:
    Object 'ImageDataSet' cannot be loaded from 'kedro.extras.datasets.pillow'. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pillow.ImageDataSet:
    <https://kedro.readthedocs.io/en/stable/kedro_project_setup/dependencies.html>.
    Failed to instantiate DataSet 'imageset' of type 'kedro.io.partitioned_dataset.PartitionedDataSet'.
    d
    s
    • 3
    • 5
  • s

    Shreyas Nc

    12/08/2022, 1:11 PM
  • m

    Manilson António Lussati

    12/09/2022, 2:19 AM
    Hello everyone I have been studying ways to use dbx using the kedro template. Have any of you gone through this?
    y
    v
    • 3
    • 17
  • s

    Sebastian Pehle

    12/09/2022, 9:37 AM
    Hello everyone. Lets say i created a reporting pipeline in a notebook (pull data, compute cols, export excel/csv). I then packaged everything into a kedro project and everything is fine. Then the customer wants some alterations of the reports, new columns or something like that. How would i proceed to "develop" inside kedro? Transferring dirty notebook code into clean nodes is one thing, but how would i proceed to develop once everything is a node in a pipeline? In jupyter notebooks or regular py files i can run the code until some point and then alter my dataframes as i wish. How would i approach this in a kedro framework? I hope this makes sense ;)
    d
    • 2
    • 3
  • m

    Max S

    12/09/2022, 10:26 AM
    Hey Team, QQ regarding versioning. I think I am clear regarding versioned datasets. Searching the docs I could not find anything regarding versioned parameters. Given that I trigger a pipeline run, I create versioned datasets (if I choose to do so), but can I also create a versioned save of the used parameters (from one or more
    yaml
    files?) Or am I thinking about this the wrong way and there is a good reason that this is not possible? Thanks!
    d
    • 2
    • 4
  • b

    Balazs Konig

    12/09/2022, 12:19 PM
    Hi Team! 🦜 Quick question hopefully: How can I specify
    schema
    for a
    SparkDataSet
    in the catalog entry itself? What’s the best practice to represent the
    StructType()
    object in yaml? EDIT: or is the best practice to always save the schema to a separate params file and add just the
    file_path
    to the catalog entry?
    d
    • 2
    • 2
  • a

    Adam_D

    12/09/2022, 3:49 PM
    Hey Team! I am newer to AWS and I have followed the Kedro AWS Batch Deployment Guide but I am getting a dataset error like this stackoverflow question. I do not want to put datasets into the docker container. I want to be able to read from and write to S3. The AWS tutorial puts the S3 URL as an environment variable. Do I need to do this for each dataset? I'm really looking for how to connect my docker container to S3 to run in a kedro pipeline. Thanks in advance for your help and let me know if I need to provide more detail.
  • j

    John Melendowski

    12/10/2022, 12:41 AM
    Any plans to make a conda feedstock for kedro-viz?
    🔥 1
    j
    • 2
    • 4
  • m

    Mathilde Lavacquery

    12/12/2022, 2:54 PM
    Hi Kedro Team, what would be the best practice to pass parameters both in pipeline_registery and in the catalog ? e.g., I have a pipeline that runs for different countries and different brands, some pipelines / datasets are at country level, some are at country x brand level. All my pipelines are using namespacing to deal with the “scope” (ie the countries / brands) My pipeline registery looks like that:
    Copy code
    def register_pipeline():
    
        countries = ["a", "b"]
        brands = ["1", "2", "3"]
        return {
            "preprocess_macro": preprocess_macro_pipeline(countries=countries),
            "preprocess_brand": preprocess_brand_pipeline(countries=countries, brands=brands),
            "train_model": train_model_pipeline(countries=countries, brands=brands),
        }
    and my catalog looks like that:
    Copy code
    {% for country in ["a", "b"] %}
    {% for brand in ["1", "2", "3"] %}
    
    {{ country }}.pre_master_macro:
        ...
    
    {{ country }}.{{ brand }}.master:
        ...
    
    {{ country }}.{{ brand }}.model:
        ...
    Would there be a way to single pass countries / brands in both ? The usecase is that we are developing a generic pipeline that can be replicated in different regions / for different brands according to the client
    d
    i
    • 3
    • 5
1...567...31Latest