https://kedro.org/ logo
Join Slack
Powered by
# questions
  • n

    Nikos Kaltsas

    02/15/2023, 12:11 AM
    Hello, anyone have a guide / example for running Kedro pipelines on DataBricks with dbx?
    ❤️ 1
    d
    t
    +2
    • 5
    • 14
  • d

    dor zazon

    02/15/2023, 11:01 AM
    Hey, i am trying to setup experiment tracking in Kedro. Everything work fine but Kedro cant save session metadata into sqlite3 DB. i get the following error every time i run Kedro:
    n
    • 2
    • 7
  • d

    dor zazon

    02/15/2023, 11:02 AM
    the session_store.db is created, but it is locked. i have tried multiple times to delete the db and run again, but the issue remains
  • v

    Vassilis Kalofolias

    02/15/2023, 2:42 PM
    Hello, I have a quick question: What is the use-case for
    dataset.confirm()
    ? Documentation is not clear, also it is not implemented in any dataset except
    IncrementalDataSet
    .
    d
    • 2
    • 4
  • a

    Alexander Johns

    02/15/2023, 6:19 PM
    Hey team, trying to implement a very simple custom dataset which loops through a directory reading in specific csvs that match a string pattern as pandas DataFrames performing basic cleaning operations on the individual DataFrames and concatenating them together. class definition is located:
    Copy code
    src/<my_project>/extras
    ├── __init__.py
    └── datasets
        ├── __init__.py
        └── <my_custom_dataset>.py
    catalog entry:
    Copy code
    raw_custom_dataset:
      type: <my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>
      filepath: 01_raw/folder/*
    when I run the node keep getting the following error:
    Copy code
    An exception occurred when parsing config for DataSet 'raw_custom_dataset':
    Class '<my_project>.extras.datasets.<my_custom_dataset>.<MyCustomDataSet>' not found or one of
    its dependencies has not been installed.
    Kedro =0.18.3
    d
    d
    p
    • 4
    • 21
  • m

    Matthias Roels

    02/15/2023, 8:23 PM
    I want to create a new kedro project for ML and I am not sure how to properly structure it. I want to have a default pipeline consisting of a feat and modelling pipeline. Both the feat and modelling pipelines will consist of several sub-pipelines and I want to make sure that nested pipeline structure is somehow reflected in my project structure. I was thinking about nested dirs in the pipelines folder, e.g.
    Copy code
    pipelines/
      - feat/
        __init__.py
        pipelines.py. <—- contains all subpipelines in this folder e.g feat_sales
        - feat_sales/
          __init__.py
          nodes.py
          pipelines.py
        - …
    Would this be the right approach? And if not, what is the recommended way to structure this? Do we use modular pipelines or regular pipelines?
    d
    • 2
    • 3
  • a

    Alex Ferrero

    02/16/2023, 10:48 AM
    Hey team, is there anyway I can write to a delta table using the catalog making an upsert like in SQL? I have seen in kedro's code that the only supported modes are append, overwrite, error, errorifexists and ignore.
    d
    • 2
    • 5
  • v

    Vassilis Kalofolias

    02/16/2023, 11:06 AM
    Hello, I am trying to override a bool parameter using the CLI (running from bash):
    kedro run --params round_occupancy:False
    However the
    False
    is read as a string. Is there a way to pass a boolean instead? Note that the original param is correctly read from the Yaml file as a bool.
    d
    • 2
    • 47
  • k

    Keith Edmonds

    02/16/2023, 10:53 PM
    Does Kedro interface with sklearn pipeline at all? https://scikit-learn.org/stable/modules/compose.html If there is a ML model built with sklearn's pipeline and we want to do the data engineering in Kedro. Is there an ability to look at the whole pipeline in kedro?
    d
    • 2
    • 3
  • s

    Sebastian Pehle

    02/17/2023, 9:36 AM
    iam working on windows and want to store my project on a network folder. however, when i want to create a pipeline i get an error of incorrect paths (source path must be relative to...). this stems from the variation of pathlib.Path: using without resolve(), it gives me the 'drive letter' specific path (X:/abc), if with .resolve(), it gives me the 'network' specific path (//server.xy/a/b/c/abc). Manually removing the .resolve() from all kedro source files solves the problem. someone has a better solution?
  • s

    Solomon Yu

    02/17/2023, 3:34 PM
    Edit: documenting a solution to my own question. I'm trying to load a multisheet ExcelDataSet through the Catalog. I'm trying to load all sheets this way,
    Copy code
    my_excel_file:
      type: pandas.ExcelDataSet
      filepath: some-excel-file.xlsx
      load_args:
        sheet_name: None
    and I get
    Worksheet named 'None' not found
    Is there a way to load all sheets through the catalog? Yes Edit: Not documented fully in Kedro, but in case someone comes across this, reminder to use YAML API syntax for
    None
    which is
    null
    or
    ~
    or . Thanks in advance!
    🌟 2
    v
    • 2
    • 5
  • c

    Chris Santiago

    02/17/2023, 6:26 PM
    Hi-- new to Kedro. Why is there a
    pyproject.toml
    file in the root project directory and then a separate
    setup.py
    in the
    src
    directory? Trying to understand their separate roles. I'd like to introduce
    kedro
    to my team at work. We use a custom cookiecutter to setup all of our projects so that they're pip-installable across various platforms. Our current update uses only
    pyproject.toml
    , and we've removed last remnants of
    setup.py
    and
    setup.cfg
    . Specifically, I'm trying to understand how I could structure a custom starter, incorporating our existing cookiecutter, that would allow for editable installs with extras-- but I don't want to disturb any existing kedro functionality. How does the kedro cli use the
    src/setup.py
    file, if at all; same with the
    pyproject.toml
    in the root folder
    d
    f
    • 3
    • 3
  • r

    Ricardo Araújo

    02/18/2023, 3:39 PM
    I feel this might be a basic questions, but can't quite make it work. In a kedro pipeline (Pipe1) there are two defined pipelines (say pipeA and pipeB), where pipeB is a remapping of inputs and outputs of pipeA. For organization purposes, I don't want to spin pipeB into an individual Pipe2. However, in another pipeline (Pipe3) I want to re-use pipeB, but not pipeA. Is there a way to do this?
    e
    • 2
    • 1
  • a

    Alexis Eutrope

    02/18/2023, 8:22 PM
    Hi, I have a question, (and very likely what I'm trying to do is a kedro anti-pattern) Basically I'd like to have a node pipeline With a diamond shape : EntryNode --> [IntermediateNode X for X in list] ---> OutputNode Doing this require each intermediate node to have a runtime (in code, not in static catalog.yml files) generation of dataset. I don't want to use them to store any data, thoses datasets would just be dummy ones in order to keep the dependency/ordering of nodes Any ideas on how I could deal with that ? (Ideally within the create_pipeline file) Thank you
    d
    n
    • 3
    • 12
  • d

    Dustin

    02/20/2023, 3:29 AM
    hi team, I had this issue and thought it would be good to share and looking for advice. I intended to set a parameter "quoting" for saving csv value in catalog.yml (image 2). In the normal to_csv() function, you would use quoting=csv.QUOTE_NONNUMERIC as a parameter but this won't work in catalog.yml as it doesn't know modual 'csv'. one way is manually set the desired integer (image 3) but found that value was actually changed from 3 to 2 (image1, 3 used to stands for csv_QUOTE_NONNUMERIC but now it is 2 ) in the latest version 'csv'. is there any way we could fetch this dynamically (like how quoting = .csv_QUOTE_NONNUMERIC works in normal to_csv() in catalog?
    m
    • 2
    • 2
  • j

    Juan Luis

    02/20/2023, 10:36 AM
    I'm trying to run
    kedro new
    in non-interactive ways so it's compatible with Jupyter shell commands (
    !kedro new ...
    ). I see two ways: •
    yes "Project Name" | kedro new --starter=xxx
    : works, but it's UNIX-only (don't think this will work on Windows), assumes there is only one question, and looks a bit arcane. • `vim kedro.yaml ... && kedro new --starter=xxx --config=kedro.yaml`: works, but I'm creating a file that I will only use once, plus it's not very easy to discover what structure should the file have (one has to navigate to the source code of the starter in question, locate the
    prompts.yml
    , and mimic those keys) I see that this has been unchanged since basically "forever" but I'm wondering what are folks opinions on having a way to pass these configs to the CLI? something like
    kedro new --starter=xxx --project_name=yyy
    d
    a
    • 3
    • 11
  • j

    Juan Luis

    02/20/2023, 11:53 AM
    also, a totally unrelated question: our docs say "Kedro offers a command (
    kedro jupyter notebook
    )" but actually this depend on the starter that got used - for example, projects created with
    standalone-datacatalog
    do not have it. is this a docs issue (we should amend those to explain how to get that command working regardless of the starter used) or a starter issue (all starters should have
    kedro jupyter notebook
    )?
    d
    n
    • 3
    • 11
  • l

    Lan Bui

    02/20/2023, 2:08 PM
    hi friends! Is there anyway to load a Kedro project from a project directory? I recently lost my motherboard, but salvaged the data where my project was in. When I reinstalled kedro, the project is no longer recognized even though all the files are there
    d
    • 2
    • 3
  • m

    Massinissa Saïdi

    02/20/2023, 5:54 PM
    Hello kedroids ! Do you know how to pass boolean parameter in CLI ?
    kedro run --params key:false
    or
    kedro run --params key:False
    return string
    'False' or 'false'
    . I know i cant set parameter to 0 or '' to have the false condition but there is a better way ? thx 🙂
    d
    a
    n
    • 4
    • 11
  • l

    Laura Oñate

    02/21/2023, 2:25 AM
    Hi*,* quick question, approximately how many users are using kedro?
    j
    y
    • 3
    • 3
  • r

    Robertqs

    02/21/2023, 4:43 AM
    Hi guys I’m facing a strange issue on windows, where the kernels in jupyter lab instance keeps disconnecting. It would normally work for a while after restarting jupyter lab but the problem gets back after. Doesn’t seem to be a resource issue, as this happens when working on a light notebook. Wondering if anyone has encountered a similar issue? Thanks in advance.
    ✅ 1
    n
    • 2
    • 7
  • j

    Jan

    02/21/2023, 10:26 AM
    Hi! Did anyone yet create a script / function to delete old experiments systematically? If I were to create one to delete the old folders, how can I remove them from the session_store.db (sqlite)?
    t
    • 2
    • 4
  • o

    Olivier Ho

    02/21/2023, 10:48 AM
    hello! Is there a way to autoincrement micropackage version?
    d
    • 2
    • 2
  • a

    Armen Paronikyan

    02/21/2023, 10:56 AM
    Hi guys. I would like to know if there is a way to have experiment tracking deployed on separate server. So that several kedro applications will send the data there? Thanks in advance.
    t
    • 2
    • 5
  • n

    Nicolas Oulianov

    02/21/2023, 7:55 PM
    Hey, is there any plan to make an interactive kedro viz ? Where you could plug and unplug data connectors. A bit like the Blender 3d or Unreal Engine scripting system
    t
    • 2
    • 4
  • d

    datajoely

    02/22/2023, 8:00 AM
    Sorry about that - not sure why Stackoverflow kedro tag just posted all of that.
    • 1
    • 1
  • f

    Francisco Alejandro Leal Tovar

    02/22/2023, 1:34 PM
    Hello everybody! Has anybody worked with Snowpark in Kedro?
    d
    • 2
    • 3
  • s

    Solomon Yu

    02/22/2023, 2:18 PM
    Hiya, trying to figure out params for data processing pipelines. I'd like to set parameters for catalog config so that catalog.load() can load dataset with load_argsdtypedtypes_dict_var, like:
    Copy code
    my_dataset:
      type: pandas.CSVDataSet
      filepath: path-to-my-file.csv
      load_args:
        parse_dates: ['col_3']
        dtype: dtypes_dict_var
    So that catalog.yml won't be too many lines long. I'd like this dtype dict to live within conf/base/parameters/my_pipeline.yml, as:
    Copy code
    dtypes_dict_var: {
      "col_1": int,
      "col_2": str,
      "col_3": DateTime<'Y-m-d'>, # assumes YAML API syntax will be converted to datetime object
    }
    Another question here is how to pass in datetime object type to load_args:dtype I'd like this dtype dict to affect only loading my_dataset, and not use as a global var if possible. A separate case could be that I'd like to load the same dataset with different dtypes in different pipelines, which could utilise TemplatedConfigLoader.. Passing in certain parameters doesn't seem very straightforward tbh :/ Thanks in advance!
    n
    m
    • 3
    • 13
  • i

    Ian Whalen

    02/22/2023, 2:32 PM
    Not to necro an old thread, but does OmegaConf help with this? Specifically: defining a list of constants in
    settings.py
    and looping over it in the jinja-esque style to define catalog entries. Couldn’t immediately tell from the docs, though I haven’t had much time to work with the new loader. I am excited too of course 🙂
    s
    n
    m
    • 4
    • 5
  • s

    Shiv Pratap Singh

    02/22/2023, 3:04 PM
    Hi Everyone, I am facing an issue while saving a pickle dataset on an on-Prem s3a. Attached is - Catalog Entry and Error. Any ideas 🙂 ?
    d
    • 2
    • 1
1...131415...31Latest