https://kedro.org/ logo
Join Slack
Powered by
# questions
  • u

    user

    08/05/2022, 6:28 PM
    how to use kedro.versioning in latest version of kedro? I have previously used kedro version 0.17.6 in my project.Now i have upgraded my version to 0.18.2.But in latest version of kedro there is no module of kedro.versioning.So i am getting a error that module is not founded.Can anyone please suggest something
    ✅ 1
  • u

    user

    08/06/2022, 7:38 AM
    ModuleNotFoundError: No module named 'kedro.versioning' i have upgraded my kedro to latest version.But i have used kedro.versioning in my project.And latest kedro has no module of this name.Can anyone please suggest anything
    ✅ 1
  • t

    Tom Taylor-Vigrass

    08/11/2022, 1:09 PM
    Anyone seen this error on kedro viz before? Just upgraded to 5.0 (wasn’t seeing the err on 4.7.2)
    Copy code
    AttributeError: 'TranscodedDataNode' object has no attribute 'original_version'
    d
    t
    +2
    • 5
    • 14
  • u

    user

    08/16/2022, 8:08 AM
    Is it possible to automate creating readme content using sphinx in kedro? Kedro uses sphinx formatting already, and when creating a pipeline it automatically creates a README.md file. With sphinx you can automate creating documentation. I want to know if and how it is possible to make sphinx automate writing readme files?
    ✅ 1
  • u

    user

    08/20/2022, 10:58 PM
    dynamic parameters on datasets in Kedro I would like to call an api to enrich an existing dataset. My approach would be to wrap the API client in an APIDataSet and the other dataset is just a CSVDataSet. Then I'd use both as an input on a Node. For example, I've got keywords in the CSVDataSet and would like to enrich them with Google News using an api. I need the keywords from...
    👍 1
  • u

    user

    08/23/2022, 2:28 PM
    How to use generators with kedro? Thanks to David Beazley's slides on Generators I'm quite taken with using generators for data processing in order to keep memory consumption minimal. Now I'm working on my first kedro project, and my question is how I can use generators in kedro. When I have a node that yields a generator, and then run it with kedro run --node=example_node, I get the following error: DataSetError: Failed while saving data to data set...
    👍 1
  • m

    Mavis Tian

    08/25/2022, 4:29 PM
    Hi everyone!
  • m

    Mavis Tian

    08/25/2022, 4:30 PM
    Does anyone know how to avoid getting a software installation form while running kedro command?
    d
    • 2
    • 2
  • a

    Andrew Stewart

    08/31/2022, 6:34 PM
    Quick question: Where should one be managing the version number of a kedro project?
    project_version
    in
    pyproject.toml
    seems to correspond to the version of Kedro, not the actual project at hand. Is the package version in
    src/setup.py
    the right place, or is that being controlled by some higher level process?
    a
    n
    • 3
    • 9
  • f

    Faisal Malik

    09/07/2022, 9:13 AM
    Hi, quick question, I currently use kedro
    0.17.4
    but we want to convert our kedro pipeline into prefect flow using this approach but I notice this approach only available starting from kedro `0.18.0` while on latest kedro
    0.17
    it's not present. I tried to install prefect on the same environment as my kedro
    0.17.4
    but looks like it causes a dependency issue. Should I upgrade my kedro? and if that so, how hard it'll be to upgrade from kedro
    0.17.4
    to kedro
    0.18.2
    ?
    n
    • 2
    • 1
  • t

    Toni

    09/09/2022, 1:21 PM
    Hello kedro team! I have a kedro issue, let's see if you can help me... We have a kedro pipeline that trains a model and generates a dataframe as output. The problem we now have is that we need to loop that pipeline to generate multiple dataframes (that, at the end, we want to concatenate to have a single table). Is possible to, given a parameter of
    set_targets = ['a', 'b', 'c']
    , we can loop the same pipeline for each value of that list without "copying" that pipeline? We may have a different length and names for that "`set_of_targets`", and thus we want to avoid manual work... Also, we need the outputs to have "dynamic" names in the catalog in order to save all the outputs (
    score_{{target}}
    ...
    score_a
    ,
    score_b
    ,
    score_c
    )... I think this could be done with
    jinja
    , but no idea where to start... Thank you very much!
    s
    n
    • 3
    • 5
  • u

    user

    09/14/2022, 5:48 AM
    How to do SQL like querying parquet the files in kedro I'm new to kedro, I'm just wondering if I could do SQL like querying the parquet files instead of using Dataframe API's. Please help me out if there is a way. Thanks in advance!
  • y

    Yetunde

    09/14/2022, 8:43 AM
    I copied @Ashish Verma's question from the #C03RKNSN3U0 channel to here: Hey team, I am still struggling with Kedro + Databricks integrations. After resolving all the package conflicts, I am encountering a never seen before error. While creating the Kedro session, I am facing the Py4JSecurityException error. Error and screenshot below for reference. py4j.security.Py4JSecurityException: Constructor public org.apache.spark.SparkConf(boolean) is not whitelisted. Can you please help me on this?? Solutions I find on google is to create a new cluster, which is not an option for us. Also, I tried removing the context.py which initialize the custom spark context, this is also not working. Let me know if there is something else I need to do, thanks. 🙂 Thanks Ashish Verma
    a
    t
    +3
    • 6
    • 19
  • t

    Toni

    09/14/2022, 9:38 AM
    Hi! Quick question: if an entry in the
    data catalog
    uses
    versioned: True
    , when I use
    catalog.load(...)
    in a notebook, does it always load the last version of that entry? How can I indicate the version to load? Thank you!
    m
    • 2
    • 1
  • r

    Riley Brady

    09/14/2022, 8:16 PM
    (
    0.18.1
    ) It seems that
    kedro run --tag some_tag1,some_tag2
    will run any nodes with
    some_tag1
    OR
    some_tag2
    . Is there any functionality to use AND instead of OR? My workaround right now is to create a custom tag of
    some_tag1-some_tag2
    and then calling that directly. It would be nice if I could list out a few tags and only run nodes that have all of them. But I understand why OR is the default.
    d
    a
    • 3
    • 3
  • k

    Kasper Janehag

    09/15/2022, 9:42 AM
    (
    0.17.7
    ). Hi! I have some problems with running Kedro on a with a self-hosted Hadoop cluster. As part of a pipeline, I have a transcoded registered dataset
    table@pandas
    and a
    table@spark
    , with the following settings.
    Copy code
    ...table@pandas:
      type: "${datasets.parquet}"
      filepath: "${base_path_spark}/…/master_table"
     
    ..._table@spark:
      <<: *pq
      filepath: "${base_path_spark}/…/master_table"
    The
    base_path_spark
    is a HDFS location. These are then used in a pipeline in the following matter.
    Copy code
    spark_to_pandas = pipeline(
            pipe=Pipeline(
               [
                    node(
                        func=spark_utils.to_pandas,
                        …
                        outputs=f"..._table@spark",
                   )
               ]
           )
       )
     
        data_cleaning = pipeline(
            pipe=Pipeline(
               [
                    node(
                        func=enforce_schema_using_dict,
                        inputs={
                            "data": f"..._table@pandas",
                       },
         …
                   )
               ]
           )
       )
    The
    data_cleaning
    node is suppose to pick up the output from the
    spark_to_pandas
    node, using the transcoded dataset. However, an
    DataSetError
    is raised with the following message
    Copy code
    Exception has occurred: DataSetError
    [Errno 2] No such file or directory: 'hadoop': 'hadoop'
    Failed to instantiate Dataset 'telco_churn.master_table@pandas' of type 'kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet'.
    If we remove the transcoding in the DataCatalog and register the datasets as individual registries the error disappears. Anyone know how to proceed from this kind of error? Could it be related to client specific Hadoop environment? How can we proceed with trouble shooting?
  • t

    Toni

    09/16/2022, 9:24 AM
    Hi team! How can I save an
    np.array
    with the
    catalog
    ? Is there a way to save this
    np.array
    as CSV "easily"? I cannot use the
    pandas.CSVDataSet
    because it is not a dataframe. I think that this can be done with trascoding datasets, but I do not know if there is a
    dataset
    for
    np.arrays
    in kedro.
    m
    a
    • 3
    • 2
  • u

    user

    09/17/2022, 10:58 AM
    DataSetError in Docker Kedro deployment I try to deploy example Kedro starter project (pandas-iris). I successfuly run it locally (kedro run), and then, having kedro-docker install, init a Docker, build image and push it to my registry. Unfortunately, both kedro docker run and docker run myDockerID/iris_image generate the same error: DataSetError: Failed while loading data from data set CSVDataSet(filepath=/home/kedro/data/01_raw/iris.csv, load_args={}, protocol=file, save_args={'index': False}). [Errno 2] No such file or...
    ✅ 1
  • o

    Olivia Lihn

    09/20/2022, 11:11 PM
    Hi everyone! We are trying to deploy a kedro pipeline in spark*,* using
    --master yarn
    and
    --deploy-mode cluster
    , not locally or client-mode. Has anyone tried this? If so, what are the extra files/code you added to make
    spark-submit
    work?
  • j

    Jonas Kemper

    09/27/2022, 10:30 AM
    Hi friends, has anyone ever deployed kedro projects behind some kind of
    lightweight HTTP API
    ? I'm thinking one
    POST request
    to start a run and then a
    GET request
    to poll the run status etc. ? Is there any reference material that you could point me to?
    h
    d
    • 3
    • 3
  • u

    user

    10/03/2022, 1:48 PM
    How to run a kedro pipeline interactively like a fuction I would like to run kedro pipelines in jupyter notebook with different inputs, so something like this: data = catalog.load('my_dataset') params = catalog.load('params:my_params') pipelines['my_pipeline'](data=my_dataset, params=my_params) Is there a way to do this? Also, if I have to feed some inputs to other nodes but the starting one (for example the second node), how would this be done?
  • u

    user

    10/07/2022, 7:58 AM
    How to change the kedro configuration environment in jupyter notebook? I want to run a kedro pipeline in the base env using jupyter notebook. I do this the following way: %reload_kedro --env=base session.run(pipeline_name='dpfm1') Doing this, the %reload_kedro command raises the following error: RuntimeError: Could not find the project configuration file 'pyproject.toml' in --env=base. If you have created your project with Kedro version >> kedro, version 0.18.2 What's the matter here?
    ✅ 1
  • u

    user

    10/07/2022, 2:18 PM
    Is there a way to include an Azure Databricks Lakehouse query as a DataCatalog dataset in kedro? We want to use kedro to control our ML pipelines in Azure Databricks. We are querying (and joining) relatively large tables in Databricks' Lakehouse. Therefore, we would like to include those joins in the DataCatalog without bringing the full precedent tables into memory. Something like: scooters_query: type: pandas.SQLQueryDataSet credentials: scooters_credentials sql: select * from cars where gear=4 load_args: index_col: [name] Is there a way to perform this in Databricks?
  • u

    user

    10/07/2022, 4:58 PM
    import fsspec throws error (AttributeError: 'EntryPoints' object has no attribute 'get') import fsspec throws error (AttributeError: 'EntryPoints' object has no attribute 'get') []
    ✅ 1
  • u

    user

    10/08/2022, 6:38 PM
    Kedro on Databricks: Cannot import SparkDataset Cannot import SparkDataset in Databricks using; from kedro.extras.datasets.spark import SparkDataSet [1]:

    https://i.stack.imgur.com/wkDIJ.jpg▾

  • u

    user

    10/13/2022, 8:18 AM
    Kedro template configuration does not load globals.yml configuration into catalog.yml FOR Jupyter Lab It works for the CLI but not for Jupyter Lab. I have just recently upgraded from 0.17.1 to 0.18.3. Have made changes to settings.py which uses the Templated Config Loader. I have copied the content of https://github.com/kedro-org/kedro/blob/main/kedro/ipython/__init__.py to .ipython/profile_default/startup/00-kedro-init.py and I am still seeing the Jupyter Notebook trying to read...
  • u

    user

    10/13/2022, 2:38 PM
    TypeError: __init__() got an unexpected keyword argument 'config_loader' Getting this error while running Kedro with Session.run() on Databricks TypeError: init() got an unexpected keyword argument 'config_loader'
  • u

    user

    10/14/2022, 8:58 AM
    kedro PartitionedDataSet lazy writting to spare memory? I am working with PartionedDataSet in kedro. One of the data set is of type pillow.ImageDataSet: raw_images: type: PartitionedDataSet
  • m

    Maren Eckhoff

    10/14/2022, 6:17 PM
    Hi team, is it possible to pass a constant into a kedro node? Something like this:
    Copy code
    node(my_fun, 
    inputs = {input_data: my_data, input_params: params:my_params, constant: 4}
    outputs = output_data})
    d
    • 2
    • 1
  • u

    user

    10/17/2022, 3:18 PM
    Include Quarto rendering in kedro pipeline and pass it inputs/outputs I am using kedro to make some comparative analysis. In a quarto report I have some chunks containing evaluation of output_var1 and output_var2 for example plot_function(output_var1) plot_function(output_var2) At the end of the pipeline, I would like to compute my report with quarto using the outcome of my pipeline, without saving it to the data catalog. from quarto import render def create_pipeline(**kwargs) -> Pipeline: return pipeline([node(func=function1,...
12345...31Latest