https://kedro.org/ logo
Join Slack
Powered by
# questions
  • e

    Elior Cohen

    01/22/2023, 11:45 AM
    Is there a way to create templates (like starters) for pipelines? I'd image something like
    kedro pipeline create my_pipeline --template my_awesome_template
    which will include template code for the pipeline
    d
    • 2
    • 1
  • m

    Massinissa Saïdi

    01/23/2023, 9:26 AM
    Hello again me 🙂 To write a file (csv, pandas or other) with kedro_dataset API as MatplotlibWriter (or other) whe sould specify the credentials. In documentation, credentials should be wrote like that:
    Copy code
    credentials: Credentials required to get access to the underlying filesystem.
                    E.g. for ``S3FileSystem`` it should look like:
                    `{'key': '<id>', 'secret': '<key>'}}`
    But is it possible to add the endpoint_url like that:
    {'key': '<id>', 'secret': '<key>'}, 'client_kwargs': {'endpoint_url': '<http://myurl:9000>'}}
    ? When i use the API code it doesnt work but when I use the catalog it works.
    d
    • 2
    • 2
  • p

    Prachi Jain

    01/23/2023, 12:56 PM
    Hi Team- i am new to kedro. I was looking at kedro tutorial spaceflights poject. I updated the nodes.py file and pipeline.py file as per the tutorial but when i am running
    kedro run
    then it gives me an error saying that
    Pipeline contains no nodes after applying all provided filters
    can someone help here? i am using latest version of kedro.
    d
    • 2
    • 5
  • s

    Safouane Chergui

    01/23/2023, 2:06 PM
    Hello, I’d like to know if there is a way to have kedro return None instead of raising an exception if loading an entry from the data catalog (catalog.yml) fails. Thanks
    m
    d
    • 3
    • 3
  • r

    Rob

    01/23/2023, 3:43 PM
    Hi everyone, I'm using Kedro 0.17.4 and I'm having this issue:
    d
    • 2
    • 8
  • b

    Brandon Meek

    01/23/2023, 7:41 PM
    Hey everyone, I'm looking for the "Kedro" way of doing a Monte Carlo sim. I have a very large Dataset in Presto and I want to repeatedly pull samples from it and run each group of samples through a pipeline and then rollup all of the results of the pipeline, currently I'm thinking of calling the pipeline from outside the kedro project.
    b
    • 2
    • 1
  • m

    MarioFeynman

    01/23/2023, 8:39 PM
    Hi! Is there any reason why Kedro doesnt have a 1.x.x version?
    d
    • 2
    • 2
  • a

    Alex Ofori-Boahen

    01/23/2023, 9:18 PM
    Hi there, After packaging my app for deployment using kedro package and running pip install, I see the module is installed when I do the pip list check. However, when I an python -m (package-name) it says module no module named (package name). How can I resolve this issue?
    d
    • 2
    • 3
  • i

    Ivan Danov

    01/24/2023, 11:34 AM
    Has anyone used Kedro with Apache Beam or Google Cloud Dataflow?
    k
    • 2
    • 1
  • d

    Dustin

    01/25/2023, 12:26 AM
    Hi team, I have been trying to play with hooks and followed your doc to implement both memory profile and pipeline time hooks (just copied your scripts from doc) and registered in settings.py but no hook related information is shown in the console log with
    kedro run
    (no error but same console information without hooks). Just wondering do i need to do something to 'reload' settings? or
    d
    a
    j
    • 4
    • 14
  • j

    Joel Ramirez

    01/25/2023, 2:33 PM
    Hello
  • j

    Joel Ramirez

    01/25/2023, 2:33 PM
    I am getting this error when try to run the data science pipeline
  • j

    Joel Ramirez

    01/25/2023, 2:33 PM
    Failed to find the pipeline named 'data_science'. It needs to be generated and returned by the 'register_pipelines' function.
  • j

    Joel Ramirez

    01/25/2023, 2:33 PM
    Do someone knows how to fix this ?
    d
    • 2
    • 8
  • m

    Miguel Angel Ortiz Marin

    01/25/2023, 7:46 PM
    Hi! I'm loading a plotly JSONDataSet but it's not loading a plotly fig, it's loading a python dictionary. Simple example from the docs below that gives an error: Could it be related to plotly version?
    m
    • 2
    • 2
  • j

    Jong Hyeok Lee

    01/26/2023, 5:57 AM
    Hello! Has anyone tried to ZIP the entire kedro pipeline and used it on AWS Glue? And also would there be a way to do CI/CD with this approach?
    a
    • 2
    • 1
  • s

    Sergei Benkovich

    01/26/2023, 11:33 AM
    is it possible to supply same catalog entry for inputs outputs? or how would you handle a situation where i want to extract new data based on existing and to append the newly extracted to the existing, i don’t want separate catalog entries for the two datasets
    d
    • 2
    • 1
  • u

    user

    01/26/2023, 12:48 PM
    Kedro catalog fails when overwriting a GeoJson dataset even though the driver is supported I have the following catalog item in my kedro project suggested_routes_table@geopandas: type: geopandas.GeoJSONDataSet filepath: data/04_feature/routes_suggestions_table.geojson load_args: driver: "GeoJSON" mode: "a" The keyword argument mode: "a" stands for append, meaning that every time the node is run, it should append new rows to the geojson instead of overwriting the file in the path. As stated in <a...
  • s

    Sergei Benkovich

    01/26/2023, 1:20 PM
    is it possible to make the versioned results be saved in the same folder? i produce reports and i want per run all reports to be in the same folder, currently the version=True just places each figure in a separate folder with the timestamp it ran and not the whole pipeline ran
    d
    • 2
    • 2
  • a

    Andrew Stewart

    01/27/2023, 4:55 AM
    So just throwing this out there - but does anyone happen to have a solid example of using kedro w/ poetry +
    kedro-docker
    ?
    m
    • 2
    • 2
  • p

    Paul Mora

    01/27/2023, 8:44 AM
    Hey guys - I am currently trying to save/load pyspark ml objects through the catalog. The documentation states the following: https://kedro.readthedocs.io/en/stable/tools_integration/pyspark.html#use-memorydataset-with-copy-mode-assign-for-non-dataframe-spark-objects and the recommendation to use
    MemoryDataSets
    for those non-dataframe instances. That is all fine and well, though of course not being able to save any transformers becomes quite tedious at some point. Is there any guidance/ development on that front?
    d
    • 2
    • 10
  • m

    Massinissa Saïdi

    01/27/2023, 11:42 AM
    Hello kedroids! I have an error that I can't understand:
    Copy code
    DataSetError: 
    botocore.session.session.create_client() got multiple values for keyword 
    argument 'aws_access_key_id'.
    DataSet 'dataset' must only contain valid arguments for the 
    constructor of 'kedro.extras.datasets.pandas.csv_dataset.CSVDataSet'.
    I run my code from a
    docker-compose
    with only one container (for now), I write files in s3. I specified the credentials this way:
    Copy code
    aws_credentials:
        aws_access_key_id: XXXXXXX
        aws_secret_access_key: XXXXXXX
    and my dataframe in
    catalog.yml
    this way:
    Copy code
    dataset:
      type: pandas.CSVDataSet
      filepath: ${s3.path}/data/dataset.csv
      credentials: aws_credentials
    docker-compose.yml
    Copy code
    version: '3.7'
    
    services:
    kedro:
          build:
            context: .
            args:
                PIP_USERNAME: ${PIP_USERNAME}
                PIP_PASSWORD: ${PIP_PASSWORD}
                PIP_REPO: ${PIP_REPO}
            dockerfile: dockerfile.kedro
            cache_from:
              - ia-churn
          image: ia-churn
          command: kedro run --env prod --pipeline data-processing
          volumes:
            - .:/usr/src/app/
            - ./data/01_raw/:/usr/src/app/data/01_raw
    In
    conda
    environement evrything works. Someone has an idea please ? More informations: I used kedro v0.18.4 and python 3.10
  • p

    Patrick Deutschmann

    01/27/2023, 1:19 PM
    Hey everyone! I’m new to Kedro, and I first want to thank all the contributors. You’ve genuinely built a fantastic tool! Is it possible to save outputs to multiple data sets? For instance, I’d like to write my feature data both to the local file system and to, say, an Azure blob storage. Thanks 😊
    ❤️ 3
    f
    d
    • 3
    • 3
  • m

    Massinissa Saïdi

    01/27/2023, 4:12 PM
    Hello ! I use kedro with sagemaker following this kedro-tutorial And I have a question: is it possible to use functions created in nodes inside the
    sagemaker_entry_point.py
    script, example:
    Copy code
    ...
    from pipelines.ml_model.model import train_model
    
    ...
    
    def main():
        ....
        regressor = train_model(...)
        ...
    
    if __name__ == "__main__":
        # SageMaker will run this script as the main program
        main()
    Because I have this error:
    ModuleNotFoundError: No module named 'pipelines'
    Thanks for your help 🙂
    d
    m
    • 3
    • 25
  • a

    Alexandra Lorenzo

    01/27/2023, 5:49 PM
    Hello, How to read specific files (images) based on the filename prefix (as example) ? I'm using Partioned Dataset to read and write images with a specific extra dataset. My folder is organized as follow with more than (120.000 images):
    Department 1
    |-> Zone 1
    |---> IMG_00001.tif
    |---> MSK_00001.tif
    I need to read first IMG_*****.tif then MSK_*****.tif is it possible ? Thanks for your help
    Copy code
    raw_images:
      type: PartitionedDataSet
      dataset:
        type: flair_ign.extras.datasets.satellite_image.SatelliteImageDataSet
      path: /home/ubuntu/train
      filename_suffix: .tif
      layer: raw
    d
    • 2
    • 1
  • a

    Andrew Stewart

    01/28/2023, 12:15 AM
    Anyone else happen to be using Athena as inputs for Kedro? Found this: https://github.com/atsangarides/kedroio but wondering if anyone is doing anything different
  • r

    Rob

    01/28/2023, 6:00 PM
    Is there a way to set a
    main.py
    instead of using the CLI commands to run all the pipelines? (If so any docs or examples would be great) (Using
    kedro==0.17.7
    )
    d
    • 2
    • 3
  • o

    Ofir

    01/28/2023, 6:26 PM
    How does Git and Kedro play ball together? We have a classification data science pipeline written in Python and hosted on a GitHub repository. While I get the concept of Kedro project and having a workspace per data model, I don’t get how do I sync the code across projects/workspaces/experiments. Should Kedro tasks (and pipelines) be thin wrappers that import my existing Python code, or not? what are the best practices if you already have an existing code base and Git repository with your code? Thanks!
  • o

    Ofir

    01/28/2023, 6:33 PM
    I guess what I’m missing is how is Kedro integrated as part of a real-world application, and not just data science in vacuum. Is there like a kedro folder in Git with per-experiment folder and relative Python imports for the core code? Pointers to a real-world application on GitHub that use Kedro across different experiments would be useful.
    s
    • 2
    • 13
  • s

    Sergei Benkovich

    01/29/2023, 9:12 AM
    in globals.yaml i try to use something like:
    Copy code
    split_folder: "split_1"
    
    folders:
      raw: "{split_folder}/01_raw"
    but it doesnt work and i just get a new folder called, {split_folder}/01_raw is there anyway to accomplish this? i’m running several versions one after the other, i want each one in different folder, but don’t want to have to change paths for all the subdirs i defined...
    m
    • 2
    • 19
1...101112...31Latest