https://kedro.org/ logo
Join SlackCommunities
Powered by
# questions
  • b

    Bernardo Branco

    05/15/2023, 7:36 PM
    Hey everyone, Im building tests for a kedro pyspark pipeline and I would like to pass a specific spark configuration needed to pass the tests. I have tried various things but nothing works. Which is the best way to pass the spark configurations typically found in spark.yml into tests? Thank you in advance!
    d
    n
    r
    • 4
    • 14
  • j

    Juan Luis

    05/16/2023, 7:53 AM
    hi folks, are custom omegaconf resolvers supposed to work for the catalog? I’m trying to define one but the raw
    ${name:value}
    strings are passed to the dataset. I can elaborate more if needed
    m
    n
    • 3
    • 21
  • d

    Dotun O

    05/16/2023, 8:18 PM
    Hey team, can we set a way for the pipeline not to fail if the catalog entry does not exist? Is there a way to set a default None value if the catalog does not return a directory, will this be set in the hooks.py file?
    j
    n
    • 3
    • 4
  • a

    Afaque Ahmad

    05/17/2023, 3:41 AM
    Hi Kedro Team I'm using Kedro
    0.16.6
    and using the
    load_context
    to get
    params
    ,
    "credentials*", "credentials*/**"
    . We're upgrading
    Kedro
    to
    0.18.8
    and it seems
    load_context
    is no more accessible. How can I replicate the same functionality in the Kedro 0.18.8?
    m
    j
    y
    • 4
    • 12
  • a

    Afaque Ahmad

    05/18/2023, 11:35 AM
    Hi Kedro Team I'm using a
    spark.SparkDataSet
    to load and save datasets to a
    delta
    lake. I need to save
    incremental
    numeric versions of data e.g 1, 2, .. as opposed to the current timestamp. Is there a way to do that in the current implementation and specify the version number while loading?
    👍 1
  • a

    Andrej Zachar

    05/18/2023, 3:59 PM
    Hello Kedro Team, I am currently working on a pipeline that trains a model, and I'm employing versioning for the said model. In the subsequent steps of the pipeline, I would like to use a specific version of the model to make predictions. However, it seems that the Kedro node input doesn't support the use of versions. Below is the snippet of the problematic code:
    Copy code
    python
    node(
        predict,
        inputs=["classifier_flaml:<version_ideally_from_params>", "X_src"],
        outputs=["y_src_pred", "y_src_pred_proba"],
    ),
    Can you provide guidance on how to accomplish this task? Thank you.
    ✅ 1
    d
    n
    • 3
    • 6
  • j

    Jose Nuñez

    05/18/2023, 11:31 PM
    K Hi Everyone! K Quick question: In one of my nodes I have a function
    f
    that takes a dataframe as input, makes some stuff and output a python
    dict
    .
    Is there any way to save that dict in the data catalog? . as a workaround I was saving it as a pandas csv and later transforming it back to a dict. but I'm tired of doing that. Thanks in advance 😄
    w
    d
    • 3
    • 4
  • m

    Muhammad Ghazalli

    05/19/2023, 2:33 AM
    Hi Kedro Team, I'm currently deploying Kedro in Kubernets and scheduling it via Kubernets Cron Jobs and running it using Kedro run -p <pipeline_name>. I'm facing an issue when there's a wrong inside the pipeline (data not available or else which is fine) it will continuously retry. I want to add an exit code into my Kedro pipeline so if there's an error it will exit immediately. Where to put it, can't find it in the docs. Thank you
    m
    • 2
    • 5
  • m

    Matthias Roels

    05/19/2023, 6:59 AM
    What’s the difference between the pandas generic dataset and, say, pandas csv dataset classes? From what I can tell, they offer the same functionality for reading csv files. Is one a legacy version that was supposed to be replaced by the other?
    m
    d
    • 3
    • 4
  • l

    Luis Cano

    05/19/2023, 3:29 PM
    Hello everyone! Quick question, what is the correct way of defining an optional input in the Kedro pipeline? is it possible? The function takes some df's inputs as optional but I would also want to have that feature in the pipeline to not edit it everytime one input is not available. Thanks!
    m
    • 2
    • 3
  • s

    Sneha Kumari

    05/19/2023, 6:11 PM
    Hello everyone, I am following the documentation to use kedro-mlflow with my pipeline registry and it gives me the following error when running kedro mlflow ui: Any inputs are helpful. Thanks!
    Copy code
    /opt/anaconda3/envs/frontline/lib/python3.8/site-packages/kedro_mlflow/framework/cli/cli.py:161  │
    │ in ui                                                                                             
    │   158 │   ) as session:   
    │   159 │   │   
    │   160 │   │   context = session.load_context()                                                   
    │ ❱ 161 │   │   host = host or context.mlflow.ui.host 
    │   162 │   │   port = port or context.mlflow.ui.port   
    │   163 │   │      
    │   164 │   │   if context.mlflow.server.mlflow_tracking_uri.startswith("http"):                   │
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
    AttributeError: 'KedroContext' object has no attribute 'mlflow'
    Python version: 3.8.16 Kedro: 0.18.6 kedro-mlflow: 0.11.8
    d
    y
    • 3
    • 7
  • n

    noam

    05/21/2023, 2:02 PM
    Hi Kedro community! My team and are I trying to create an optimal setup for running experiments in parallel. Concerningly, it appears as though if we are to change the contents of a parameters file (i.e.
    conf/local/parameters.yml
    ) during a run, the results of the run may be affected. For example, let's say I set
    hyper_tune: False
    in
    parameters.yml,
    and run
    kedro run
    in the terminal. If I change the text in
    parameters.yml
    to
    hyper_tune: True
    (for example, if I am setting up the parameters for my next experiment) before the "training" node begins executing, it appears that Kedro will then read hyper_tune: True. In this example, that would mean that Kedro would execute hyperparameter tuning (despite being instructed not to do so at the beginning of the run). Am I missing something? Is the answer as simple as passing all parameters to the pipeline one time as a whole (i.e. using a before_pipeline_runs hook) rather than to each node?
    s
    • 2
    • 1
  • f

    fmfreeze

    05/22/2023, 12:51 PM
    I hope this is a simple question and I am just missing out a basic configuration: When I write a simple pipeline like:
    Copy code
    def create_pipeline(**kwargs) -> Pipeline:
        return pipeline([
            node(func=do_stuff, inputs=[], outputs='MyMemDS'),
            node(func=do_more_stuff, inputs=['MyMemDS'], outputs='SecondMemDS')
        ])
    I thought my
    conf/base/catalog.yml
    needs the entries:
    Copy code
    MyMemDS:
      type: MemoryDataSet
    SecondMemDS:
      type: MemoryDataSet
    But when I run the pipeline - which works, also with kedro-viz - it does not utilize
    catalog.yml
    entries at all. The output of my first node is an empty
    {}
    dictionary and if I rename or delete the entries in
    catalog.yml
    it "works" like before and the first node returns an empty dictionary. Do I need to register the catalog anywhere? I simply want to access the object which is returned by my
    do_stuff()
    function. What am I missing out?
    f
    j
    • 3
    • 17
  • j

    Juan Luis

    05/23/2023, 6:00 AM
    hi folks, I'm noticing a difference between
    ConfigLoader
    and
    OmegaConfigLoader
    . while following the standalone-datacatalog starter, I notice that
    Copy code
    ConfigLoader("conf").get("catalog.yml")
    works, but
    Copy code
    OmegaConfigLoader("conf").get("catalog.yml")
    returns
    None
    . on the other hand,
    OmegaConfigLoader("conf").get("catalog")
    seems to work (notice no
    .yml
    extension), and
    OmegaConfigLoader("conf")["catalog"]
    works consistently for both config loaders. is this intentional? compare for example https://github.com/kedro-org/kedro/blob/41f03d9/tests/config/test_config.py#L116 with https://github.com/kedro-org/kedro/blob/41f03d9/tests/config/test_omegaconf_config.py#L149
    m
    • 2
    • 6
  • r

    Richard Bownes

    05/23/2023, 8:05 AM
    If I have an established project, and I want to integrate MLflow into it, what's the most straight forward pathway?
    j
    • 2
    • 1
  • a

    Afaque Ahmad

    05/23/2023, 9:12 AM
    Hi Kedro Folks I have 2 hooks, one is
    PipelineHooks
    and
    MLFlowHooks
    and both have
    before_pipeline_run
    . I need the hook
    before_pipeline_run
    defined in
    PipelineHooks
    to run before the one in
    MLFlowHooks
    . I've specified this order below in
    settings.py
    , but it doens't work:
    Copy code
    HOOKS = (
        PipelineHooks(),
        MLFlowHooks(),
    )
    Is there any way to keep an order of execution?
    👀 1
    j
    d
    • 3
    • 6
  • d

    Debanjan Banerjee

    05/23/2023, 2:43 PM
    Hi Team, im on Kedro 18.8. I see on a fresh installation , it gives me this , any solutions ?
    meow checkmark 1
    d
    • 2
    • 9
  • g

    Guilherme Parreira

    05/24/2023, 12:15 PM
    Hi guys! I am trying to load kedro functionalities in a
    jupyter notebook
    but it gives me the following error:
    %load_ext kedro.ipython
    Copy code
    RuntimeError: Missing required keys ['project_version'] from 'pyproject.toml'.
    Kedro was working fine for the last 2 weeks. I didn't do any update on
    kedro
    . In
    requirements
    I have
    kedro~=0.18.6
    In
    Pipfile.lock
    I have kedro
    ==0.18.4
    In
    project.toml
    I have:
    Copy code
    [tool.kedro]
    package_name = "cashflow_ml"
    project_name = "cashflow-ml"
    kedro_init_version = "0.18.6"
    
    [tool.isort]
    profile = "black"
    
    [tool.pytest.ini_options]
    addopts = """
    --cov-report term-missing \
    --cov src/cashflow_ml -ra"""
    
    [tool.coverage.report]
    fail_under = 0
    show_missing = true
    exclude_lines = ["pragma: no cover", "raise NotImplementedError"]
    I tried to change the
    kedro_init_version
    to
    0.18.4
    but I still got the same error. Does someone have a clue on it?
    j
    • 2
    • 5
  • g

    Guilherme Parreira

    05/24/2023, 1:00 PM
    It worked brow! I don't know why it happened. I installed
    prophet
    package last night. But it shouldn't modify my
    pyproject.toml
    Thank you so much. It saved my day.
    🙌🏼 1
    j
    • 2
    • 2
  • a

    Andreas_Kokolantonakis

    05/25/2023, 1:23 PM
    hello everyone! does anyone have an example using partition by when saving parquet files using kedro catalog? thank you very much in advance
    n
    • 2
    • 3
  • h

    Hugo Evers

    05/25/2023, 2:10 PM
    Would it be a good idea to add a “concatenate pandas pipeline” options to a pipeline? Which allows one to run the through in a pandas .pipe function instead of a traditional pipeline constructs with separate inmemory I/O when for example a run flag is supplied? My usecase is as follows: there is a long text-preprocessing pipeline we use, which looks kind of like this:
    Copy code
    return pipeline(
            [
                node(
                    func=rename_columns,
                    inputs="pretraining_set",
                    outputs="renamed_df",
                    name="rename_columns",
                ),
                node(
                    func=truncate_description,
                    inputs="renamed_df",
                    outputs="truncated_df",
                    name="truncate_description",
                ),
                node(
                    func=drop_duplicates,
                    inputs="truncated_df",
                    outputs="deduped_df",
                    name="drop_duplicates",
                ),
                node(
                    func=pad_zeros,
                    inputs="deduped_df",
                    outputs="padded_df",
                    name="pad_zeros",
                ),
                node(
                    func=filter_0000,
                    inputs="padded_df",
                    outputs="filtered_df",
                    name="filter_0000",
                ),
                node(
                    func=clean_description,
                    inputs="filtered_df",
                    outputs="cleaned_df",
                    name="clean_description",
                ),
                node(
                    func=concat_title_description,
                    inputs="cleaned_df",
                    outputs="concatenated_df",
                    name="concat_title_description",
                ),
            ]
        )
    However, on AWS batch these would be run on separate containers, I now use the cloudpickle dataset to facilitate this, but it is actually not neccesary when i use something like dask. I could also instead run this pipeline like this:
    Copy code
    return (
            df.pipe(rename_columns)
            .pipe(truncate_description)
            .pipe(drop_duplicates)
            .pipe(pad_zeros)
            .pipe(filter_0000)
            .pipe(clean_description)
            .pipe(concat_title_description)
        )
    The aforementioned pipeline has tags, and filtering in a modular pipeline depending on pre-training, tuning, which language, etc. The flatten pipeline would be nice to use in the case of kedro run runner=… concat_pipeline=true, or something like that. Is this idea worth exploring? It is really not essential, i can work around it, but the ability to have pipelines that can “fold” like this is quite appealing.
    d
    • 2
    • 8
  • h

    Hadeel Mustafa

    05/25/2023, 4:25 PM
    hey everyone! Has anyone used
    redshift-spark
    in kedro before? appreciate the help if someone can show me an example on how can this be done, specifically the driver used for redshift. Thanks in advance!
    • 1
    • 1
  • h

    Higor Carmanini

    05/25/2023, 10:31 PM
    I have an issue with Kedro and SparkDatasets. I am using a
    PartitionedDataSet
    to read many CSVs into Spark DataFrames. I just found an issue where, apparently, Spark automatically appends the column position to the column name (as read from the header) to create the actual final name. See example in image. As this sometimes is done for deduplications, I investigated whether this was something close, and sure enough there is another dataset in this same
    PartitionedDataSet
    that reads another column of the same name. This could "explain" this funky behavior of Spark of thinking it is a duplicate. Of course, though, these are two separate DataFrames. Has anyone stumbled upon this issue before? I can't find any references online. Thank you! EDIT: Solved! It was due to Spark's default setting of case insensitiveness.
    ✅ 1
    n
    • 2
    • 4
  • s

    Sidharth Jahagirdar

    05/25/2023, 11:06 PM
    hey team! Can someone please share the documentation for kedro glass.
    h
    j
    • 3
    • 3
  • r

    Rebecca Solcia

    05/26/2023, 8:13 AM
    Good morning! Has anybody ever tried to access Databricks tables from a local Kedro project? I would need help on this topic! Thank you 🙂
    🧱 1
    n
    • 2
    • 1
  • f

    fmfreeze

    05/26/2023, 12:24 PM
    Hi Kedronistas. When I define an
    AbstractDataSet
    ,
    kedro-viz
    does not display the Dataset Type and the File Path property in the details section for that Dataset. How can I make them show up?
    a
    • 2
    • 5
  • f

    fmfreeze

    05/26/2023, 12:47 PM
    And another interesting one: Is it possible that a
    node
    has a (dynamic)
    Parameter
    as output? E.g. I have multiple "normal" parameters defined which serve as input into a
    process_params
    node. That node should - dependend on the normal parameter inputs - output a single parameter which might serve as input to other nodes. Currently - by simply outputting that "parameter" value - this is by default a
    MemoryDataSet
    ✅ 1
    n
    • 2
    • 6
  • a

    Artur Dobrogowski

    05/30/2023, 8:15 AM
    Hi has anyone found any issues when using OmegaConfigLoader from new kedro version? For me when I enabled it and made used env templating in file, the config Validator started raising issues for correct lists in yaml, error looks like this:
    Copy code
    ValidationError: 1 validation error for KedroMlflowConfig
    tracking -> disable_tracking -> pipelines
      value is not a valid list (type=type_error.list)
    While the config looks like this:
    Copy code
    tracking:
      disable_tracking:
        pipelines: []
    j
    • 2
    • 14
  • a

    Artur Dobrogowski

    05/30/2023, 8:24 AM
    Also relevant question - any ideas how to start debugging this? I'm not very familiar with debugging in kedro This https://docs.kedro.org/en/stable/development/debugging.html is not very helpful since the bug does not occur in pipeline or in node, but in config loading.
    n
    • 2
    • 1
  • f

    Florian d

    05/30/2023, 1:48 PM
    Does anyone know if there is a reason why we could not pass the context to the
    before_pipeline_run
    hook? In some cases it would be good to have access to the loaded config at that point.
    ✅ 1
    f
    n
    +4
    • 7
    • 20
1...222324...31Latest