https://kedro.org/ logo
Join Slack
Powered by
# questions
  • j

    Jo Stichbury

    11/20/2022, 4:11 PM
    Just bumping this to the top of your Monday morning so it's visible (I think it got a bit lost on Friday!)
  • u

    user

    11/20/2022, 9:38 PM
    Grouping raw datasets in a kedro visualization I am looking for a way to group all of the raw datasets in a kedro pipeline visualization into one collapsible/expandable "node", similar to the way that namespaces are collapsible/expandable. In order to do this with a namespace, however, it seems that you need a function with inputs and outputs, which obviously would not be applicable at the raw data stage. Here is my current visualization: <a href="https://i.stack.imgur.com/DQBO8.png" rel="nofollow noreferrer">enter image description...
  • l

    Leo Casarsa

    11/21/2022, 1:57 PM
    ❓ What is Kedro's view/opinion on the structure of projects which contain a frontend component ❓ For example, let's assume that the model outputs of my Kedro pipeline is fed into a Python Dash application - where should the source code for the Dash application live?
    d
    • 2
    • 1
  • a

    Ahmed Afify

    11/21/2022, 3:40 PM
    Hi everyone, I have just started using Kedro and still learning the basics. I followed this documentation (How to integrate Amazon SageMaker into your Kedro pipeline: https://kedro.readthedocs.io/en/stable/deployment/aws_sagemaker.html) that executes the Spaceflights tutorial on AWS Sagemaker, but I was able only to run the first 3 nodes as the pipeline failed at split_data([model_input_table,parameters]) -> [X_train@pickle,X_test,y_train,y_test]. The error is KeyError: 'features'. I noticed as well that S3 was not updated with any dataset although they are present in the catalog.yml as instructed in the documentation. Please advise.
    n
    • 2
    • 2
  • f

    Francisca Grandón

    11/21/2022, 7:16 PM
    Hi everyone! I need some help with the kedro debugger. I was trying to set up the launch.json file in the documentation for debugging, and I was wondering if it is possible to integrate this with a docker python debugger. In other words, I want to start the debugger from the container terminal, is this possible?? I also posted the question in stackoverflow if its not that clear here. I would really appreciate some help!
    d
    n
    • 3
    • 6
  • u

    user

    11/21/2022, 7:48 PM
    Integrate kedro debugger with docker container I'm currently trying to set up the debugger for kedro in VScode adding the following launch.json as the documentation suggests. The thing is, I have 2 different docker containers that I would like to use to debug my kedro pipelines, so this lauch.json file does not work for me, because it executes the debugger in the normal terminal, not inside the docker...
  • z

    Zihao Xu

    11/21/2022, 11:16 PM
    Hi team, we are trying to use the experiment tracking feature of kedro within databricks, but are running into the following error:
    Copy code
    INFO     Loading data from 'modeling.model_best_params_' (JSONDataSet)...   ]8;id=949765;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py\data_catalog.py]8;;\:]8;id=434875;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py#343\343]8;;\
    
    DataSetError: Loading not supported for 'JSONDataSet'
    where we have the following catalog entry:
    Copy code
    modeling.model_best_params_:
      type: tracking.JSONDataSet
      filepath: "${folders.tracking}/model_best_params.json"
      layer: reporting
    The same code runs completely fine locally, but is failing within data braicks. Could you please help us understand why?
    d
    n
    • 3
    • 25
  • m

    Moinak Ghosal

    11/22/2022, 8:30 AM
    Hi Team. Can you please help why my catalog shows only parameter and not the datasets?
    d
    n
    • 3
    • 2
  • a

    Ankar Yadav

    11/22/2022, 11:49 AM
    Hi Team, is there a way to increase verbosity in kedro, if I am running a model (say tensorflow) and I want to see metrics of each epoch, I am currently unable to see each iteration metrics when I run them as part of pipeline. Something like this:
    d
    n
    • 3
    • 6
  • a

    Ankar Yadav

    11/22/2022, 1:04 PM
    Hi Team, also is there a way to drop certain data frames from pipelines which arent required anymore without defining them in catalogs, my assumption is that if you dont define in catalogs it continues to stay in Memorydataset, right?
    d
    d
    n
    • 4
    • 7
  • a

    Andreas Adamides

    11/23/2022, 12:09 PM
    Hi, I can see that in 0.18.2 release this has been added:
    Kedro now uses the Rich library to format terminal logs and tracebacks
    Is there any way to revert to plain console logging and not use rich logging when running a Kedro pipeline using the Sequential Runner from the API and not via
    kedro
    CLI?
    Copy code
    runner = SequentialRunner()
    runner.run(pipeline_object, catalog, hook_manager)
    I tried to look for configuration, but I believe you can only add configuration if you are in a kedro project and intend to run with Kedro CLI. Any ideas?
    d
    n
    • 3
    • 15
  • a

    Afaque Ahmad

    11/24/2022, 9:43 AM
    Hi Team, I'm working on a use-case wherein I need to make certain values from a
    cache
    made available inside the
    _load
    method of multiple Kedro Datasets. How to go about it? Can we use hooks? or anything simpler?
    ✅ 1
    m
    d
    n
    • 4
    • 20
  • f

    Fabian

    11/24/2022, 12:13 PM
    Hi Team, I'm just getting started with kedro in Pycharm IDE. For this, I set up a new project and added the python scripts of my previous project as source root. I managed to set up a first running pipeline and to run it within a jupyter notebook. Now the problem: When i want to visualize the pipeline from command line (kedro viz) inside the project folder, apparently the imported source root is not found. However, I can visualize it with line magic %kedro_viz from inside the notebook. I feel like both ways of visualization should work. Did i set up the project in a wrong way?
    ✅ 1
    n
    • 2
    • 8
  • j

    Jose Alejandro Montaña Cortes

    11/24/2022, 7:40 PM
    Hi everyone i am currently developing a project which uses GCP credentials. The problem i am facing is that i want to deploy a container of this pipeline but the secrets should no be in the container. I want to know if by using kedro-docker package the secrets are not added to the docker container or in case they do what can i do to handle these credentials with the docker deployment 😄 thanks
    w
    e
    • 3
    • 12
  • a

    Afaque Ahmad

    11/25/2022, 6:53 AM
    Hi Team, I've created a method called
    get_spark
    inside the
    ProjectContext
    which I need to access in the
    register_catalog
    hook. How can I access that function?
    d
    m
    • 3
    • 10
  • e

    Elias

    11/25/2022, 10:13 AM
    I get a weird DataSetError:
    Copy code
    kedro.io.core.DataSetError: 
    __init__() got an unexpected keyword argument 'table_name'.
    DataSet 'inspection_output' must only contain arguments valid for the constructor of `kedro.extras.datasets.pandas.sql_dataset.SQLQueryDataSet`.
  • e

    Elias

    11/25/2022, 10:13 AM
    Copy code
    catalog.yml:
    
    inspection_output:
      type: pandas.SQLQueryDataSet
      credentials: postgresql_credentials
      table_name: shuttles
      layer: model_output
      save_args:
        index: true
    d
    • 2
    • 8
  • e

    Elias

    11/25/2022, 10:13 AM
    according to documentation table_name is the correct keyword: https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.pandas.SQLTableDataSet.html
  • s

    Shreyas Nc

    11/25/2022, 10:26 AM
    Hi, I created a catalog.py under src/my_test/catalog.py and added changes as in the documentation:
    Copy code
    from <http://kedro.io|kedro.io> import DataCatalog
    from kedro.extras.datasets.pillow import ImageDataSet
    
    io = DataCatalog( {
                "cauliflower": ImageDataSet(filepath="data/01_raw/cauliflower"),
        }
    )
    But I dont see this in the catalog and get an error when I reference this in the pipeline node that the entry doesnt exist in the catalog. Am I missing something here? Note: this is on the latest version of kedro kedro, version 0.18.3 I just joined the channel, if I am bot using the right format or channel to ask this question, please let me know . Thanks in advance!
    d
    • 2
    • 4
  • a

    Anu Arora

    11/25/2022, 1:45 PM
    Hi Team, I am trying to make dbx work with kedro 0.18 using wheel file. I have resolved majority of the issues but i am stuck on one issue(hopefully the last); while executing the
    dbx execute <workflow-name> --cluster-id=<cluster-id>
    ; kedro is failing on the below error;
    Copy code
    /local_disk0/.ephemeral_nfs/envs/pythonEnv-f0037269-19cc-4c81-9dc2-43bcd22cd8ff/lib/python3.8/site-packages/kedro/framework/startup.py in _get_project_metadata(project_path)
         64 
         65     if not pyproject_toml.is_file():
    ---> 66         raise RuntimeError(
         67             f"Could not find the project configuration file '{_PYPROJECT}' in {project_path}. "
         68             f"If you have created your project with Kedro "
    
    RuntimeError: Could not find the project configuration file 'pyproject.toml' in /databricks/driver.
    I can see that the file was never packaged but I am not sure if it was supposed to be packaged or not. Plus it is pointing to working directory as /databricks/driver somehow. Below is the python file I am running: as spark_python_task
    Copy code
    from kedro.framework.project import configure_project
    from kedro.framework.session import KedroSession
    
    package_name = "project_comm"
    
    configure_project(package_name)
    
    
    with KedroSession.create(package_name,env="base") as session:
        session.run()
    Any help would be great!! PS: I have tried with dbx deploy and launch as well and is still facing the same issue
    ✅ 1
    y
    j
    • 3
    • 4
  • k

    Karl

    11/26/2022, 12:27 AM
    Good afternoon Kedro team, My group is evaluating Kedro as an ETL framework. So far it's working quite well - thank you for building and supporting this tool. I have some questions about best practices that I can ask in separate threads, but more immediately: is there a standard way to silence the DeprecationWarnings? The DeprecationWarning logging messages clutter up the log and make the CLI difficult to use. I tried the strategy posted here in GitHub using Hooks but this didn't work. Is there a standard way to silence these warnings?
    d
    j
    • 3
    • 3
  • f

    Fabian

    11/26/2022, 1:27 PM
    Hi Team, I am experimenting with modular pipelines. The pipeline template takes 5 parameters, out of which 2 vary within the different namespaces. The other 3 parameters are static within in my current use-case, but might require adaptions in future. Therefore, I would like to define the 3 static parameters as constant for pipeline template, without having to re-define with each namespaced pipeline. However, because of the namespaces of each instantiation the pipelines do not find the 3 static parameters without defining them for each namespace. How can I do this in a proper way?
    d
    • 2
    • 4
  • y

    Yousri

    11/28/2022, 3:27 PM
    Hello kedro team, I'm actually working on project of Churn prediction and i finish all pipeline and the job work fine. I work on Kedro 0.16.5 because it was compatible with some packages on our environment. After packging the project now i'm able to run it from command line with:
    Copy code
    python3 -m project_name.run
    But i have question about parameters. When i run the packaged project i can't anymore pass parameters to the project or modifiy the parameters.yml so my question is how to pass arguments when i run a packaged kedro project ?
    d
    n
    • 3
    • 5
  • a

    Afaque Ahmad

    11/29/2022, 6:56 AM
    Hi Team, Is there a way I can access the
    catalog
    dict in the
    after_node_run
    hook?
    ✅ 1
    d
    • 2
    • 1
  • f

    Fabian

    11/29/2022, 9:59 AM
    Hi Team,
    another beginner's question: I have created a pipeline that nicely analyzes my DataFrame. Now, I add a new level of complexity to my DataFrame and want to execute the pipeline on each level, similiar to a function in groupby.apply.
    Can I do this without modifiying the pipeline itself? E.g., splitting the DataFrame ahead of the pipeline and remerging it afterwards while leaving the existing pipeline as it is?
    d
    • 2
    • 1
  • a

    Ankar Yadav

    11/29/2022, 11:37 AM
    Hi Team, can I add an optional node to pipeline, It should be executed only if specific parameter is set?
    m
    n
    • 3
    • 10
  • b

    Balazs Konig

    11/29/2022, 3:09 PM
    Hi Team 🦜, How can I limit the loaded config for native latest kedro in 2 dimensions, eg. `kedro run --pipeline dc_xyz --env dev`: 1. by env (
    conf/dev/
    ) 2. by pipeline (
    conf/base/data_connectors/xyz/
    ) Is there a simple way to achieve this double filter without much hacking?
    d
    n
    • 3
    • 15
  • j

    Jan

    11/30/2022, 10:29 AM
    Hi all! Is there a way to run an environment exclusively, meaning that
    conf/base
    will not be loaded? I would like to do something like
    kedro run --env=prod
    and in the
    prod
    env I have a catalog that is prefixed (e.g.
    file: data/prod/01_raw/file.txt
    ) so that I can have the prod data separated. I would like to avoid leakage of development data into the prod env. For example if I add a new step and create a new entry in the data catalogue (
    base
    ) and forget to add this entry in the prod catalog it will be used later on in the prod environment by default because it is not overwritten? Instead I would like to get an error or implicitly use a MemoryDataset, in other words: don't load
    conf/base
    . Does this make sense? 😄 Edit: Just realizing that this behaviour would be possible if I just use
    conf/base
    as the prod env and always develop in a
    conf/dev
    env. However, ideally I would like to use by default the
    conf/base
    and only work in prod by specifying it explicitly to avoid mistakenly changing something there 🤔
    👍 1
    d
    • 2
    • 6
  • q

    Qiuyi Chen

    11/30/2022, 6:35 PM
    Hi team, Hope this message finds you well. I try to add a list of dataframes as an input to kedro pipeline, here is what I did, but it is not working when I try to put multiple dataframes, can you help me with it? Thank you very much
    Copy code
    from pyspark.sql import DataFrame
    
    def function_a(params: Dict, *df_lst: List[DataFrame]):
    
         report = pd.Dataframe()
         for df  in df_lst:
               temp = function(df,params)
         report = pd.concat([report,temp])
    
         return report
    I can run function like this
    Copy code
    Function_a(params, df1,df2,df3)
    But in the pipeline, how can I define the node and catalog in this situation. Here is what I did, please let me know where I did it wrong
    Copy code
    def create_pipeline(**kwargs):
       return Pipeline(
          [ node( function = function_a,
                  Inputs = ["params", "df_lst"],
                  outputs= "report",
          ]
       )
    
    Catalog =  DataCatalog(
           data_sets={"df_lst": df1},
           feed_dict={"params":params, },
       )
    I can only run the pipeline when df_lst is just one dataframe, but I do want it do be something like “df_lst”: df_1,df_2,df_3 …df_n(n>3)
    d
    f
    • 3
    • 4
  • f

    Fabian

    12/01/2022, 10:59 AM
    Hi Team, is there an example of how to programmatically create pipelines? In my usecase i want to apply the same pipeline on a variable number of datasets. Output names of my pipeline should be dependant on the input filenames.
    m
    • 2
    • 4
1...456...31Latest