https://kedro.org/ logo
Join Slack
Powered by
# questions
  • c

    charles

    04/20/2023, 12:35 PM
    Another probably 🤦‍♂️question. Could someone help me understand why the env parameter in my catalog isn't being injected into the catalog from my
    locals/parameters.yml
    file? catalog:
    Copy code
    parsed_documents:  # Just one document for now.
      type: json.JSONDataSet
      filepath: '<s3://mybucket/${env}/myjson.json>
    local/parameters.yml file entry:
    env: "main"
    in kedro ipython trying to load i am getting:
    Copy code
    DataSetError: Failed while loading data from data set JSONDataSet(filepath=mybucket/${env}/myjson.json, protocol=s3, save_args={'indent': 2}).
    mybucket/${env}/myjson.json
    f
    m
    • 3
    • 8
  • l

    Leo Cunha

    04/20/2023, 12:56 PM
    Hello! Is there a way I can add a flag to kedro run using plugin framework without having to override the whole
    cli.py
    ?
    d
    b
    +2
    • 5
    • 17
  • m

    Merel

    04/20/2023, 3:17 PM
    Can anyone with Spark skills lend a hand and help fixing the Kedro pyspark-iris starter?
    pyspark
    3.4.0 was released on the 13th of April and has broken our
    pyspark-iris
    . I’ve written up my findings so far in an issue: https://github.com/kedro-org/kedro-starters/issues/123 but it could be I’ve been approaching this all wrong and I’ve now reached the point where I could really use some help figuring out what is going on 🙏
    batsignal 1
    ✅ 1
    spark 1
    m
    d
    • 3
    • 3
  • b

    Beltra909

    04/21/2023, 7:06 AM
    Hello, first time Kedro user here. I have started experimenting with my own Data Sources and I am facing some issues. I have some pandas DataFrame that I would like to save in a parquet file inside my NetApp StorageGrid S3. Everything goes smootly until the next node in the pipeline try to load the file from s3. I can see the file is present in the bucket. However I get this expection:
    DataSetError: Failed while loading data from data set
    ParquetDataSet(filepath=<my file_path>,
    load_args={'engine': pyarrow}, protocol=s3, save_args={'engine': pyarrow}).
    AioSession.__init__() got an unexpected keyword argument 'target_options'
    . I have tried with different versions of fsspec, s3fs, kedro and python and I get the same issue. Here is what I am using currently: Python 3.10.10, Kedro 0.18.7, s3fs 2023.3.0, fsspec 2023.3.0, aiobotocore 2.4.2, pandas 1.5.3. Pip check does not show any broken requirements. Has anyone experienced this problem before? Extensive googling didn't show any result....
    n
    • 2
    • 9
  • s

    Si Yan

    04/21/2023, 8:11 PM
    Hi All, I am new to Kedro. I need to load data from Snowflake in Kedro. I searched some previous posts and found that Snowflake dataset is now available in Kedro 0.18.7. But I can’t find any documentation showing how to use it. Can I write a sql query like sqlQueryDataset? How to define the credentials? Could someone give an example? Thanks!
    m
    • 2
    • 2
  • r

    Rob

    04/22/2023, 6:09 PM
    Hi everyone, I'm trying to use Jinja2 syntax on Kedro 0.18.4 to dynamically define the variable
    storage_type
    , this is my how
    globals
    YAML looks like:
    Copy code
    storage_mode: "local"
    
    storage:
      local: "data/"
      gcp: "<gs://my-bucket/data/>"
    
    data:
      {% if storage_mode == 'local' %}
      storage_type: ${storage.local}
      {% elif storage_mode == 'gcp' %}
      storage_type: ${storage.gcp}
      {% endif %}
      player_tags: ${storage_type}/01_player_tags
      raw_battlelogs: ${storage_type}/02_raw_battlelogs
      raw_metadata: ${storage_type}/03_raw_metadata
      enriched_data: ${storage_type}/04_enriched_data
      curated_data: ${storage_type}/05_curated_data
      viz_data: ${storage_type}/06_viz_data
      feature_store: ${storage_type}/07_feature_store
      model_registry: ${storage_type}/08_model_registry
    I'm not familiar with this type of syntax, and I'm getting a
    ScannerError
    d
    n
    r
    • 4
    • 11
  • j

    Jason

    04/24/2023, 1:33 PM
    Hi everyone, I have a kedro pipeline and want it to run on multiple datasets (the raw (input) data are different but following the same structure. Also want to keep the outputs in the same folder structure). What is the best practice using kedro to deal with this kind of problem?
    Copy code
    dataset1
    |--01_raw
    |--02_intermediate
    |--03_primary
    |--...
    dataset2
    |--01_raw
    |--02_intermediate
    |--03_primary
    |--...
    i
    j
    • 3
    • 5
  • g

    Giulio Morina

    04/25/2023, 10:51 AM
    Hello everyone! Is there a line magic or something similar to load a kedro-viz visualisation inside a jupyter notebook?
    t
    • 2
    • 3
  • b

    Balazs Konig

    04/25/2023, 4:49 PM
    Hi Team 🦜 I have a quite complex Kedro pipeline that spends several minutes getting through configloaders, when it's starting to run. In itself this is fine, but I'm struggling to spin up the kedro kernel in a jupyter notebook or jupyterlab, because they all time out. Is there a way to increase timeout in CLI or a config file I missed - also, is maybe my assumption wrong that this could cause timeout errors? (I'm guessing that because other pipelines with less of a configloader leadtime can spin up their kernels in an otherwise identical environment) Thanks!
    d
    • 2
    • 2
  • c

    Claire BAUDIER

    04/26/2023, 8:47 AM
    Hello everyone, I have a question concerning parameters. In a project I’m working on we are using Kedro framework. We are developing several pipelines and we would like to create different parameter files for simplicity. Indeed, as we are using a lot of different parameters for different pipelines, the parameters file can become quickly messy. I was wondering if there was a way to keep using the parameter system of Kedro for calling parameters with «
    params
    », but using a file different from the default
    parameters.yml
    file. Here is what I have in mind based on one of the documentation examples:
    Copy code
    from kedro.config import ConfigLoader
    from kedro.framework.project import settings
    
    conf_path = str(project_path / settings.CONF_SOURCE)
    conf_loader = ConfigLoader(conf_source=conf_path, env="local")
    
    params = conf_loader.get(« other_parameters_file.yml")
    
    # in node definition
    def increase_volume(volume, step):
      return volume + step
    
    # in pipeline definition
    node(
      func=increase_volume,
      inputs=["input_volume", "params:step_size"],
      outputs="output_volume",
    )
    And the parameter
    step_size
    would be in the
    other_parameters_file.yml.
    My question is to know whether it is feasible with kedro to do that ? If so, how should it be done ? Thanks a lot for your help !
    f
    • 2
    • 2
  • i

    Iñigo Hidalgo

    04/26/2023, 3:16 PM
    Hi 🙂 I am running a simple pipeline which has the following config in a yml
    Copy code
    simple_conn_pt_model_filter_predict:
        date_column: date
        window_length: 0d
        gap: 0d
        check_groups: null
        continue_if_missing: true
    I am trying to edit the parameter
    gap
    through
    kedro run --pipeline ... --params=...
    , but I need to overwrite the whole dictionary
    s
    • 2
    • 4
  • j

    Juan Diego

    04/26/2023, 3:42 PM
    Hi team!, any suggestions on how to extract the Kedro version used to build a wheel via
    kedro package
    ? It will be useful when used to raise an error when doesn’t meet the one expected for a launcher.
    j
    s
    • 3
    • 4
  • a

    Agnaldo Luiz

    04/27/2023, 12:04 PM
    Hi team, quick question: How do I use parameters from my local/credentials.yml file in my base/catalog.yml file? For example,
    #credentials.yml
    win_user: 'user01'
    #catalog.yml
    data:
    type: pandas.ExcelDataSet
    filepath: C:\Users\${win_user}\data.xlsx
    m
    f
    d
    • 4
    • 8
  • r

    Rishabh Kasat

    04/27/2023, 2:08 PM
    Hi, when I a trying to run the Kedro Viz command I am getting the below error. Any idea how to resolve it? there is no _*pyspark_llap*_ module in pip as well
    Copy code
    kedro.framework.cli.utils.KedroCliError: No module named 'pyspark_llap'
    Run with --verbose to see the full exception
    Error: No module named 'pyspark_llap'
    s
    • 2
    • 1
  • s

    Season Yang

    04/27/2023, 4:03 PM
    Hi team, we are encountering a package dependencies conflict between kedro and kedro-starter for
    ipython
    and would love to get help from the team. Under the same release 0.18.7 for both kedro and kedro-starter with python 3.8, kedro provides
    ipython~=8.1
    (https://github.com/kedro-org/kedro/blob/main/test_requirements.txt#L22) while kedro-starter’s pyspark restrict
    ipython>=7.31.1, <8.0
    (https://github.com/kedro-org/kedro-starters/blob/main/pyspark/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/requirements.txt#L3) Would really appreciate any help on this! Thank you in advance!
    j
    n
    • 3
    • 11
  • k

    Kelsey Sorrels

    04/27/2023, 10:56 PM
    Hi, I've been using the Kedro+Grafana example hook, but I want to extend it to not only capture node timings, but also (in certain cases) capture operations/sec. This of course depends on forming a notion of how many "operations" occurred during the execution of a node. I can think of a bunch of wrong ways to approach this, but I'm interested in hearing folks thoughts on a "right" way to capture "operations" counts inside nodes so they can be used by the hook after nodes are executed.
    n
    • 2
    • 3
  • j

    Jo Stichbury

    04/28/2023, 4:11 PM
    ❓ What data science/ML articles have you been reading recently? Have any blog posts, tutorials or newsletters "brought you joy"? 👀 Have you watched any useful training videos or listened to any podcasts 🎧 about analytics that you want to share? I'm putting together a regular roundup of what the Kedro community has found online (not just on the Kedro blog). I'd love to share your greatest hits. Feel free to share here or DM me. Thank you 🙏 Slack conversation
    a
    • 2
    • 2
  • d

    Darshan

    04/29/2023, 5:55 AM
    I am trying to deploy the kedro package in AWS following the steps provided in the documentation but when I am running the step function, it fails with an error attached for your reference. The kedro package is developed in 0.18.7 and the python environment is 3.10. Can you suggest what could be the resolution for this error?
    d
    • 2
    • 6
  • r

    Rob

    04/29/2023, 10:01 PM
    Hello everyone, happy weekend! Does anyone have an example of how to set GCP Bucket credentials from the
    catalog.yml
    for a parquet of type
    spark.SparkDataSet
    ? I'm trying to use the
    .json
    file from Google Cloud but I'm having problems not knowing how to define it in the catalog Thanks in advance 🙂
    • 1
    • 2
  • d

    Darshan

    04/30/2023, 6:51 AM
    Copy code
    companies:
      type: pandas.CSVDataSet
      filepath: s3://<your-bucket>/companies.csv
    This is a sample provided by Kedro with the aws step function, might be useful.
    r
    • 2
    • 1
  • s

    Sebastian Cardona Lozano

    05/01/2023, 5:03 PM
    Hi all. Is there a way to generate the log files outside, in a Google Cloud Storage bucket?
    j
    • 2
    • 2
  • v

    Vandana Malik

    05/02/2023, 9:34 AM
    Hi Team, I am using kedro version 0.17.3 .. I have created custom hooks , I am able to run the pipeline but hooks are not running for me… settings.py-
    Copy code
    HOOKS = (ProjectHooks(),DataValidationHook())
    CONTEXT_CLASS = ProjectContext
    context.py-
    Copy code
    class ProjectContext(KedroContext):
        """Project context.
    
        Users can override the remaining methods from the parent class here,
        or create new ones (e.g. as required by plugins)
        """
    
        hooks = ProjectHooks()
    
        def __init__(
            self,
            package_name: str,
            project_path: Union[Path, str],
            env: str = None,
            extra_params: Dict[str, Any] = None,
        ):
            """Init class."""
            super().__init__(package_name, project_path, env, extra_params)
            self.hooks = DataValidationHook()
            self._spark_session = None
            self._experiment_tracker = None
            self._setup_env_variables()
            self._init_common_env_vars()
            self.init_spark_session()
    Can you guide me where I can look or modify in order to check why hooks are not running
    j
    • 2
    • 1
  • j

    Jordan

    05/02/2023, 11:16 AM
    I am facing an issue where a MetricsDataSet successfully loads from the catalog in a notebook, where the catalog is created with
    %load_ext kedro.ipython
    . However, in a standalone file when I am creating the catalog as follows:
    Copy code
    from kedro.framework.session import KedroSession
    from kedro.framework.startup import bootstrap_project
    
    project_path = Path(".").resolve()
    metadata = bootstrap_project(project_path)
    with KedroSession.create(metadata.package_name, project_path) as session:
        context = session.load_context()
        catalog = context.catalog
    
    data = catalog.load("my_metrics")
    I get the following error:
    DataSetError: Loading not supported for 'MetricsDataSet'
    If this is true, why does it load in a notebook?
    j
    a
    • 3
    • 46
  • a

    Adrien

    05/02/2023, 11:34 AM
    Hello ! I got this error when deploying pipeline via vertexai :
    <http://com.google.cloud.ai|com.google.cloud.ai>.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: <http://aiplatform.googleapis.com/custom_model_training_cpus|aiplatform.googleapis.com/custom_model_training_cpus>, cause=null; Failed to create custom job for the task. Task: Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726; Failed to create external task or refresh its state. Task:Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726; Failed to handle the pipeline task. Task: Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726
    I check the quotas specified but it's not the problem because it's set to 1 and I specify 0.2 cpus for each node (kedro vertexai starter guide). I think it come from gcp but i know know witch configuration to update. Someone has an explaination / face the same bug ? I'm on this issue for days and i can't find the solution...
    j
    m
    a
    • 4
    • 4
  • t

    Thaiza

    05/02/2023, 11:54 AM
    Guys, have you ever seen an error like that when running a specific pipeline on kedro? I just did a normal kedro run --pipeline SA and this error is reproduced. I don't see any significant difference from this pipeline compared to the others that are running normally... Any help is highly appreciated.
    i
    j
    • 3
    • 3
  • a

    Afaque Ahmad

    05/02/2023, 11:59 AM
    Hi Kedro Folks I'm migrating from kedro
    v0.16.x
    to
    0.18.7
    . Is there a checklist of steps that I can follow for a smooth migration?
    m
    • 2
    • 4
  • f

    fmfreeze

    05/02/2023, 5:22 PM
    Hi kedronistas :) I have a question about customization: In my company we have our own cookiecutter template resp. folder structure and naming conventions. I am struggling integrating kedro capabilities into our existing template. E.g. we don't follow the "src" naming convention but "<pkg_name>". How can I configure kedro so it knows about that and looks e.g. for the kedro_cli.py in there? So to wrap up: is it possible and if yes, what is best practice, to configure kedro to integrate well into an existing repo structure without loosing kedro functionality?
    👍 1
    👀 1
    d
    j
    • 3
    • 23
  • f

    Flavien

    05/03/2023, 10:26 AM
    Hi fellows, I am running a
    kedro
    project on Databricks (and have good hope to convince my team to go for
    kedro
    ). The documentation is very well written, thanks for that. Scrolling through the messages in Slack, I did not find a way to directly use the object
    spark
    , the
    SparkSession
    provided directly in the Databricks notebooks. Is there any way to do so?
    ❤️ 5
    j
    • 2
    • 8
  • v

    Vandana Malik

    05/03/2023, 10:37 AM
    kedro run is able to run the hooks but when i am trying to hit it using the api, it is able to run the nodes but not hooks, using kedro version 0.17.7, __ ___main.py_ __ code --
    Copy code
    import os
    
    from kedro.framework.session import KedroSession
    from kedro.framework.startup import bootstrap_project
    from kedro.runner import SequentialRunner
    from hooks import ControlTableHooks
    
    if __name__ == "__main__":
        bootstrap_project(os.path.abspath(os.environ.get("PROJECT_PATH")))
        os.chdir(os.environ.get("PROJECT_PATH"))
        with KedroSession.create(env=os.environ.get("kedro_environment")) as session:
            runner = SequentialRunner()
            context = session.load_context()
            pipeline = context.pipelines[os.environ.get("pipeline_name")]
            catalog = context.catalog
            runner.run(pipeline, catalog)
            result_dict = {"message": "Success"}
    any help
    j
    n
    • 3
    • 12
  • p

    Pavan Naidu

    05/03/2023, 10:10 PM
    kedro gurus: has anyone encountered this python interpretor error? had to re-open VSCode in project folder, sigh
    ✔️ 1
1...202122...31Latest