https://kedro.org/ logo
Join Slack
Powered by
# questions
  • l

    Leslie Wu

    06/21/2023, 5:55 PM
    Hi everyone, Any ideas for getting
    kedro viz
    to work within Amazon SageMaker Studio? I am in the terminal of a studio instance.
    n
    d
    • 3
    • 3
  • j

    Jonah Blumstein

    06/21/2023, 10:07 PM
    anyone know how to override the opt into usage analytics prompt when running
    %load_ext kedro.ipython
    for the first time in a notebook? basically this but in a notebook environment: https://github.com/kedro-org/kedro/issues/1640
    d
    • 2
    • 1
  • m

    Mate Scharnitzky

    06/22/2023, 2:05 PM
    Hi Team, Currently we’re using
    kedro==0.18.3
    that pins
    pytest~=6.2
    which is conflict with
    pandera
    , a new dependency we want to introduce.
    kedro==0.18.5
    has already
    pytest~=7.2
    , so we’re not far from resolving this conflict. On the other hand, in order to upgrade to a higher Kedro version, we would need to change our custom
    JinjaTemplatedConfigLoader
    that inherits from
    AbstractConfigLoader
    , as both
    0.18.4
    and
    0.18.5
    introduced changes to configuration management, the latter
    OmegaConf
    specifically. Also,
    0.18.6
    fixes some regressions in
    0.18.5
    . Question: based on the above context, what Kedro would you suggest for us to upgrade to? • It seems at least we need to go to
    0.18.6
    but maybe we can aim all the way to
    0.18.10
    ? • Also, do you have a migration guide about how to migrate from a custom config loader to OmegaConf bearing in mind that we need to use
    multi-runner
    as well? Thank you! @Kasper Janehag, @Jaskaran Singh Sidana
    j
    n
    • 3
    • 29
  • l

    Lucas Hattori

    06/22/2023, 3:10 PM
    Hi team, I’m starting to use
    kedro-mlflow
    for the first time in a project. Would this be also a appropriate theme for questions here? 😅 if so, regarding parameters, my kedro project has a lot of parameters. Many of them are not crucial for me to be logged in mlflow experiments. How can I easily select which parameters I’d like to have it logged? I have an idea on how to it if I were to build the mlflow hooks from scratch, but I’d love to leverage
    kedro-mlflow
    for simplicity
    n
    m
    y
    • 4
    • 5
  • c

    Camilo López

    06/23/2023, 1:56 AM
    Hi team, I'm using the new
    ManagedTableDataSet
    with Databricks Unity Catalog and I didn't find a way to store tables on a external location (ABFS Azure). There's a way of storing a external table with pure-spark :
    df.write.mode(mode).option("path", table_path).saveAsTable(f"{catalog_name}.{schema_name}.{table_name}")
    , where
    table_path
    its the path for the external location like
    <abfss://container@storage_account.dfs.core.windows.net/raw>
    . There's a way to pass this path to the
    ManagedTableDataSet
    when saving the data? Or should I go and create a
    CustomManagedTableDataSet
    with this capability?
    👍🏼 1
    n
    • 2
    • 5
  • s

    Sivasubramanian.S

    06/23/2023, 4:02 AM
    Hi team, I would like to expose some of the kedro node and pipeline as an FastAPI, Is there any documentation to refer ?
    j
    • 2
    • 1
  • m

    Marc Gris

    06/23/2023, 7:58 AM
    DEPENDENCIES ISOLATION Hi everyone, Let’s assume that my
    data_processing_node
    and my
    model_training_node
    have conflicting dependencies. How would you handle such a (unfortunately common) situation ? I know that in MLFlow it is possible to have task-specific-venv… Does kedro offer such a possibility ? If not, how would could one circumvent the issue ? 🙂 Many thanks in advance, M.
    i
    n
    • 3
    • 8
  • a

    Artur Dobrogowski

    06/23/2023, 1:19 PM
    I'm trying to understand omgeaconfigloader - https://docs.kedro.org/en/latest/kedro.config.OmegaConfigLoader.html. I know I can use it to interpolate with env variables like this
    ${oc.env:SOME_VAR}
    , but how to use it to interpolate with parameters defined in
    params.yml
    ? let's say I have param
    some_var=42
    and I want to make it fall back to it when there is no env. Is this correct?
    ${oc.env:SOME_VAR, ${some_var}}
    ?
    m
    n
    a
    • 4
    • 12
  • p

    Panos P

    06/23/2023, 4:44 PM
    Hello folks, I have a kedro framework with a lot of parameters and and catalog records, when I run kedro run with a different environment I get this log messages:
    kedro.config.config - INFO - Config from path "/conf/dev" will override the following existing top-level config keys
    This messages appear for about 30 minutes before the kedro pipeline even runs. Do you have any ideas, recommendations of speeding this up?
    m
    n
    • 3
    • 3
  • h

    Hoàng Nguyễn

    06/26/2023, 5:07 PM
    Hello. Please help me. Can I using kedro-fast-api now?
    j
    • 2
    • 1
  • a

    Andreas_Kokolantonakis

    06/27/2023, 8:06 AM
    Hi everyone, I will like to pass a date parameter from the command line when I am executing kedro run, so the catalog paths can point to the specified date. What is the best way to do so? (e.g args with python)
    j
    • 2
    • 13
  • z

    Zemeio

    06/27/2023, 8:53 AM
    Hey guys. I want to generate different pipelines based on the value of my parameters. So, basically, I can have 3 or 15 pipelines (for example), depending on the value of a few parameters. Is it possible to do that? I see that on the pipeline registry I don't have a kedro context to work with, so maybe using a config loader?
    d
    j
    • 3
    • 3
  • h

    Hugo Evers

    06/27/2023, 2:01 PM
    Hi everyone, I’m trying to pass a dictionary of keyword arguments to a function in a Kedro node, but it doesn’t seem to be working. Instead, I have to use a lambda function to pass the arguments as separate inputs. For example, I would like to have a node that looks like this (knowing that best practice is to move the
    sample_size
    to a config):
    Copy code
    node(
        func=train_test_split,
        inputs={"df": "input", "sample_size": 50},
        ...
    ),
    However, this doesn’t seem to work and I get an error refering to a separator error.. I noticed that in the modular pipeline, a similar syntax is allowed. Is that on purpose? What does work is:
    Copy code
    node(
          func=lambda df: train_test_split(df, sample_size=50),
          inputs="input",
          ...
         )
    n
    • 2
    • 17
  • a

    Alina Glukhonemykh

    06/27/2023, 4:33 PM
    Hey all, hope you are well! I'm trying to set up experiment tracking and facing issues with MetricsDataSet, here is the error I get:
    Copy code
    DataSetError: Save path '.../data/08_reporting/init/metrics.json/2023-06-27T16.17.18.857Z/metrics.json' for MetricsDataSet(filepath=.../data/08_reporting/init/metrics.json, protocol=file, 
     save_args={'indent': 2}, version=Version(load=None, save='2023-06-27T16.17.18.857Z')) must not exist if versioning is enabled.
    Here is how I define the file in catalog:
    Copy code
    metrics:
      type: tracking.MetricsDataSet
      filepath: data/08_reporting/init/metrics.json
    d
    • 2
    • 1
  • m

    Marc Gris

    06/28/2023, 11:13 AM
    Hi everyone, Given:
    Copy code
    node(pre_process, 
         inputs = ['dataset', 'params:pre_process'], 
         outputs = "pre_processed_dataset")
    Kedro will pass
    'params:pre_process'
    as a dict to
    pre_process
    which results in a bit of an “opaque” function’s signature:
    def pre_process(df: pd.DataFrame, params: dict): ...
    Is there a “kedro way” of unpacking this dict and therefore have more “transparent” signature, with individual params specified ? Thx M
    n
    j
    +2
    • 5
    • 13
  • s

    Sebastian Cardona Lozano

    06/28/2023, 9:28 PM
    Hi all. Maybe is a naive question, but how can I change the name of my kedro project? I understand that some kedro files use the project name to execute the pipeline. Thanks!
    d
    d
    • 3
    • 2
  • h

    Hugo Evers

    06/29/2023, 8:23 AM
    Hi All, The other day i was making a custom dataset for the Huggingface AudioFolder dataset, which takes a folder as an argument. As such, i gave it the parameter
    data_dir
    as input, instead of
    filepath
    , it took me roughly an hour of debugging to figure out why i loading the dataset was now dependant on the current working directory, and just wouldn;t load if i gave it a relative path (data/01_raw/..) instead of workspace/project_name/data/01_raw/…. Anyway, the issue was that filepath has a (buried) custom resolver in AbstractDataSet baseclass. So would it be a good idea to add to the docs for custom datasets that
    filepath
    has that behaviour, and maybe we could add an example of a how to make a FolderDataset. since all the current datasets in kedro-datasets point to specific files, but i’d wager there are folks out there who would want to read an entire folders’ worth of data.
    d
    j
    • 3
    • 71
  • b

    Balazs Konig

    06/29/2023, 11:22 AM
    Hi Team 🦜 Hopefully a quick one: What's the best way to specify saving catalog entries to parent directories? I have the below structure:
    Copy code
    folder1
      projects
        project1
          conf
          data
          src
    folder2
      data
    And I want to save to
    folder2/data
    - when I try relative paths, it seems to append that to
    folder1/projects/project1/<relative_path>
    (as in, it adds the dots to the path as well). How can I achieve this?
    ✅ 1
    n
    • 2
    • 6
  • h

    Harry Vargas Rodríguez

    06/29/2023, 1:50 PM
    Hello everyone. I am trying to upload a model I created using kedro, this is sklearn object. But I noticed this artefact can´t be loaded outside my kedro project. When I try to upload using pickle.load( ) it fails and the error says I don´t have the module I've created in project/src.
    model = pickle.load(open('models/model.pkl','rb'))
    This is how my catalog looks like
    best_model:
    type: pickle.PickleDataSet
    filepath: models/model.pkl
    layer: models
    It works just fine after I load kedro using %load_ext kedro.ipython Thanks in advance for your help
    d
    n
    • 3
    • 9
  • a

    Ahmed Alawami

    06/30/2023, 7:49 AM
    Hi all. I need to specify a
    date_parser
    in the catalog. Is there a way to specify a lambda function in the YAML file?
    d
    n
    • 3
    • 6
  • m

    Markus Sagen

    06/30/2023, 12:32 PM
    Hi Kedro community 👋 We have started to use Kedro for our projects at my company and want to use Weights and Biases for the experiment logger. If I would want to create a custom experiment tracker plugin / extension, is there a guide for how to get started writing your own extensions or plugins?
    n
    n
    • 3
    • 10
  • m

    Markus Sagen

    06/30/2023, 2:00 PM
    Is there a way in kedro to define or register parameters, like a logger, that all parameters and nodes can get access to, s.a. for the datasets from the catalog file?
    m
    • 2
    • 4
  • e

    Emilio Gagliardi

    06/30/2023, 11:36 PM
    Hi everyone, what kind of dataset do I create if I'm scraping data from web pages or if I'm grabbing data from RSS feeds? I have a small project I'm working on where I need to grab data from a few web sites regularly. They are mostly Microsoft notices for various products/services. I want to store the text in a mongo atlas database I have set up. I looked through the documentation but the only reference I found was for an HTTP(s) API call. Any guidance greatly appreciated 🙂
    d
    • 2
    • 3
  • m

    Markus Sagen

    07/01/2023, 8:12 AM
    Hi again! It seems the Kedro commands
    install
    and
    test
    listed in the docs here are deprecated. Is there a preferred place to report issues or add fixes to the docs? https://docs.kedro.org/en/stable/development/set_up_vscode.html#setting-up-tasks
    d
    • 2
    • 2
  • c

    Choon Ho Loi

    07/03/2023, 2:34 PM
    Would like to use Kedro in EMR. Other than this article: https://kedro.org/blog/how-to-deploy-kedro-pipelines-on-amazon-emr cant find much details. anyone have some git repo to share? Appreciate that.
    👀 1
  • h

    Hugo Evers

    07/03/2023, 3:46 PM
    Hi all, I get this weird bug with kedro viz: i have this set of nested modular_pipelines in order to make my training/test and finetuning pipelines completely dry for different languages. to that end, i mapped the train and test splits throughout my pipelines to make i namespace them at the last moment. But when i visualize i see these two unconnected artifacts, Test and Train:
    n
    r
    n
    • 4
    • 60
  • e

    Emilio Gagliardi

    07/03/2023, 6:39 PM
    A quick clarification on registering pipelines. When I install the spaceflights demo, the register_pipelines.py file contains the following:
    Copy code
    def register_pipelines() -> Dict[str, Pipeline]:
        """Register the project's pipelines.
    
        Returns:
            A mapping from pipeline names to ``Pipeline`` objects.
        """
        pipelines = find_pipelines()
        pipelines["__default__"] = sum(pipelines.values())
        return pipelines
    However, in the spaceflights tutorial videos I'm watching, the host doesn't use the above code. instead they add the following:
    Copy code
    data_processing_pipeline = dp.create_pipeline()
    return{"__default__": data_processing_pipeline,
        "dp":data_processing_pipeline}
    So I'm unclear what I'm supposed to do for my own project. Do I just use the sum(pipelines.values()) or do I manually add pipelines as in the second block? THanks kindly,
    y
    d
    • 3
    • 4
  • h

    Hugo Evers

    07/04/2023, 12:54 PM
    Hi All, In developing modular pipelines the kedro-viz tool is quite indispensable, without the viz its really quite hard to see whether inputs, outputs and parameters are connected properly. However, the most obvious workflow for an established project to develop a new pipeline/refactor a pipeline based on the documentation, requires several steps which could be streamlined. # The issues: 1. To have more control over over the interactions between the pipelines you want to work on, one can adjust the pipeline_registry to only return the pipelines you want. Which renders the other pipelines unusable (and could lead to bugs down the road). (this can be partly solved by filtering the pipelines in the kedro viz CLI command) 2. For every change you want to visualise, you’ll need to: a. save the file b. stop the current kedro viz c. run kedro viz d. go to the browser window to view the pipeline (which is quite annoying if one has only one monitor at their disposal) 3. Lastly, this does not make for easy debugging/viewing the python objects in the pipelines/nodes. # Half-way solution: using run_viz in a jupyter notebook is great, and solves some issues. I personally combine it with nbdev which allows me to convert the notebook to a python file, and then do run_viz on the entire project. This still suffers from issue 1 even more, but issue 2 and 3 are drastically reduced. Mostly because i just save/commit and then rerun the notebook and get the viz in the notebook. # Better solution Running the
    run_viz
    command almost begs for the ability to do
    run_viz(pipeline)
    . Where pipeline is an actual pipeline object. (although it would also be nice as to be able to pass the name of a pipeline to filter like the CLI command w.r.t issue 1). This way, one doesn’t need nbdev (which is a slightly controversial tool), one can develop pipelines easier, without any adjustment to the original project. Since kedro viz can already filter, i can imagine such changes being possible. Also, debugging kedro pipelines from the vscode notebook cell debugger is actually quite nice (i would arque a lot nicer than using the debug configs). Has anyone faced similar issues, or thought of a different solution?
    n
    t
    • 3
    • 8
  • e

    Emilio Gagliardi

    07/04/2023, 5:38 PM
    I have another basic question. I'm learning how to productionize ML apps and have worked through some tutorials but nothing real. In the tutorials I've seen, when developers want to make their ML model available to perform inferences they need to use a framework like fastapi or flask so that the consumer call pass data to an endpoint and get back an inference. what I don't quite understand yet is with kedro, everything is encapsulated within pipelines and if I call the project, then the default pipeline runs, which could be the data ingestion and the model training. How do we handle inferencing with kedro? do I make an inference pipeline separate from the other pipelines? do I use fastapi to create endpoints? in the spaceflights example, the purpose is supposed to be to generate predictions, but I don't see where in that example the inferencing with the trained model is addressed. Any wisdom is greatly appreciated.
    m
    • 2
    • 2
  • m

    Marc Gris

    07/05/2023, 10:20 AM
    Hi everyone, A super-ultra-duper-dummy question: Assuming that in
    conf/base/parameters.yml
    I have
    Copy code
    model:
      init:
        k: 3
        loss: warp
        no_embeddings: 50
        learning_schedule: adagrad
        rho: 0.95
        epsilon: 1.e-6
        random_state: ${random_state}
    How can I “update” a single specific field “locally” I’ve first tried in
    conf/local/parameters.yml
    Copy code
    model:
      init:
        no_embeddings: 100
    But this actually completely over-writes the model section and, of course, breaks everything. Granted: I could
    cp conf/base/parameters.yml conf/local/parameters
    and then update
    no_embeddings
    But this ends up being very “noisy”, not really “highlighting” the specificities of the local config… Is there a way to do such local / “surgical over-write” ? Thx 🙂
    j
    m
    +2
    • 5
    • 9
1...252627...31Latest