https://kedro.org/ logo
Join Slack
Powered by
# questions
  • s

    Simon Wolf

    03/24/2023, 9:27 AM
    Hi, I am using kedro with jupyter notebooks in vs-code. I load the kedro.ipython extension and it works fine. But for the catalog or session variables I get the "catalog" is not defined error from Pylance. How can I prevent Pylance from giving me this warning?
    d
    • 2
    • 2
  • t

    Tomás Rojas

    03/24/2023, 1:33 PM
    Hi, does anyone know how to skip one pipeline when running
    kedro run
    ? I have a model training pipeline which I want to skip since I have a model already working
    d
    n
    d
    • 4
    • 24
  • j

    Julien Witty

    03/24/2023, 4:00 PM
    Hey, I hope you are doing well. Our team is evaluating to use kedro as a framework. I was wondering if some of you have some experience with kedro integration with huggingface library. I was wondering how you are handleling models after training. My first reflex was to implement a custom dataset for this usecase not sure how much effort it is though Then maybe use a string as the model path (not very clean) for next node.
    d
    n
    • 3
    • 3
  • t

    Tomás Rojas

    03/26/2023, 9:45 PM
    Hi team, how can I completely remove mlflow from a project safely?
    d
    y
    y
    • 4
    • 4
  • m

    Maxime Steinmetz

    03/27/2023, 1:06 PM
    Does Kedro natively support parameters versioning? I’d like to know which parameters were used in a run for reproducibility
    d
    t
    • 3
    • 15
  • s

    Sanjeev

    03/27/2023, 4:09 PM
    Hi Team , anyone worked on invoking kedro pipeline , asynchrnously from flask api?
    j
    • 2
    • 1
  • d

    Dotun O

    03/27/2023, 4:58 PM
    Hi team, is there a command to see the list of unrun kedro nodes (after a pipeline code fails) ? For now I get a run with --from-nodes "" as part of my error statement but look like to access the actual list. Thanks cc @datajoely
    d
    • 2
    • 34
  • a

    Alfonso Licir

    03/27/2023, 10:17 PM
    Hi team, how have you been? I have been trying to use the snowflake dataset on kedro-datasets 1.2.0, but I get an error for dependencies, I already run pip install "kedro-datasets[snowflake]" but I get the message: WARNING: kedro-datasets 1.2.0 does not provide the extra 'snowflake'
    d
    • 2
    • 2
  • a

    Alfonso Licir

    03/27/2023, 10:17 PM
    What
  • a

    Alfonso Licir

    03/27/2023, 10:17 PM
    What I should review?
  • s

    Sergei Benkovich

    03/28/2023, 8:29 AM
    we are discussing in the team kedro integration and several questions rose, would appreciate any guidance :) • given we have a flow but only what to run the parts where the configuration/params changes is it possible? to avoid running all processes. i know i can run select pipelines, but thats not what i’m looking for. • mlflow w&b integration? ◦ i see this package , is it the way to go or any other native way? • data versioning/ model versioning, is it by using mlflow or any other option exists? ◦ how mature is the experiment tracking in kedro and what is planned for the future?
    y
    h
    • 3
    • 3
  • a

    Alexandre Ouellet

    03/28/2023, 2:09 PM
    Hey there! Has anyone ever tried training a yolo with Kedro? I struggle a bit with it as yolo requires a path to its dataset folder, as it handles all of the opening of files through a pytorch dataloader. Is there a way in Kedro to handle "folder" as a dataset, and leave it as a folder?
    m
    • 2
    • 18
  • z

    Ziren Lin

    03/28/2023, 3:44 PM
    Hi team, do we support customized input from parameters into the SQL query/file? I tried the following the codes file but Kedro couldn't read the parameters and input into sql query.
    Copy code
    #globals.yml
    order_number: 'abc'
    Copy code
    #catalog.yml
    sql:
      type: pandas.SQLQueryDataset
      sql: "SELECT * FROM table WHERE column = ${order_number}"
    n
    • 2
    • 22
  • f

    Filip Panovski

    03/29/2023, 8:46 AM
    Hi everyone. I have an issue Kedro 0.18.4 issues with transcoded datasets that I don't quite understand:
    Copy code
    ValueError: The following datasets are used with transcoding, but were referenced without the separator: typed_invoices
    Please specify a transcoding option or rename the datasets.
    Details within thread.
    ✅ 1
    d
    • 2
    • 7
  • c

    Christianne Rio Ortega

    03/29/2023, 10:02 AM
    Hello there! is there a limitation in terms of the naming convention in 'src' folder structure? I was planning to organize some of my nodes based on the order of execution/flow. e.g. src L engine L nodes L 000-raw L 100-cleanse L 200-refine can kedro parse ###-AAA as a prefix?
    y
    n
    • 3
    • 5
  • a

    Ana Man

    03/29/2023, 10:31 AM
    Hi Everyone, Is there updated docs (using kedro 0.18.+) on building kedro pipelines with pyspark?
    j
    • 2
    • 5
  • s

    Sj

    03/29/2023, 2:04 PM
    I am using Kedro Viz 6.0.0 and I notice that some of the functions and metrics are grayed out. They also do not show up in the graph visualization. The missing metrics however can be seen in the experiment tracking tab.
    m
    r
    • 3
    • 5
  • z

    Zoran

    03/29/2023, 6:10 PM
    Hi there, is this possible or i doing something wrong?
    d
    • 2
    • 6
  • m

    Miguel Angel Ortiz Marin

    03/29/2023, 8:10 PM
    Hi team! Wondering about pinning the load version of an specific dataset using a conf.yml file The docs include a reference to this, but the linked example doesn't show how to achieve this:
    d
    • 2
    • 1
  • m

    Miguel Angel Ortiz Marin

    03/29/2023, 8:12 PM
  • j

    Juan Luis

    03/30/2023, 8:40 AM
    hi folks, I created a custom dataset to see if I could understand the documentation and how it works, but I feel I'm doing some unconventional things and I'd need some advice:
    d
    a
    n
    • 4
    • 43
  • n

    Nok Lam Chan

    03/30/2023, 9:09 AM
    https://docs.kedro.org/en/latest/kedro.datasets.html I can’t find the documentation for the snowflake dataset, is there some thing going wrong here?
    j
    • 2
    • 5
  • i

    Iñigo Hidalgo

    03/30/2023, 9:58 AM
    Hello! I'm looking into ways to add data validation to our pipelines at runtime and came across this really cool example project using great expectations by (I assume) @Erwin https://github.com/erwinpaillacan/kedro-great-expectations-example It seems like a good way forward, using the hooks to run the validations if the dataset has some validations mapped to it in config, but was wondering if anybody has done it a different way, by treating the great expectations outputs as kedro datasets themselves. I ask this bc we have all our blob connectors implemented as kedro custom datasets, and the easiest way for us to save these validations would be by treating them as outputs from kedro nodes. I'm not interested in the html report output, I'm only interested in the json outputs as we would then send want to send alerts based on those.
    👀 1
    👍 1
    d
    j
    n
    • 4
    • 33
  • a

    Andreas Zeitler

    03/30/2023, 11:47 AM
    Hi guys! I'm currently having trouble with the execution order of catalog.save after a node ran, and the after_node_run hook, which I use to trigger mlflow documentation. node:
    Copy code
    node(
    modeloutput.predict,
                inputs=["estimator", "modelinput_x_" + name],
                outputs="modeloutput_" + name,
                tags=["output", "output_" + name] + tags,
                name="predict_" + name
    )
    The output is configured in the data catalog. hook after_node_run_run:
    Copy code
    if node.name == 'predict_dach_testsplit_test':
                    
                    //This is the output of the node:
                    y_pred_prob_comb_test = catalog.load('modeloutput_dach_testsplit_test')
                    
                   [..]
    In the logs, it seems like kedro tries to load the data in the hook, before is was written by the catalog. Is this possible and is it meant to act like that? Could be fixed by using "outputs['modeloutput_dach_testsplit_test'] instead of catalog.load., but in my understanding it should not be necessary. Thanks in advance!
    d
    • 2
    • 2
  • p

    Priyanka Patil

    03/30/2023, 12:08 PM
    Hi team!! I’m trying to convert a fairly large spark dataframe to pandas. This is a super expensive operation, so I’m trying to convert chunks of spark dataframe to pandas dataframes. Is there a kedro dataset that allows us to save multiple pandas dataframes to csv part files? thanks!
    d
    m
    • 3
    • 4
  • n

    Nikola Shahpazov

    03/30/2023, 12:28 PM
    Hi guys, Quick question. The standard tutorial for deploying kedro nodes to aws step functions uses aws_ecr. Is there a way to use a docker_hub image?
    d
    • 2
    • 1
  • a

    Andrej Zachar

    03/30/2023, 6:10 PM
    Hey everyone, I'd like to know if I can pass a "context" to my Kedro node function, which would allow me to access the names of input variables. For instance, if I'm generating a PDF report, I'd like to be able to identify the classifier and input versions I used. Is this possible? Thank you!
    d
    • 2
    • 1
  • m

    Massinissa Saïdi

    03/31/2023, 12:29 PM
    Hello, is it possible to save a sklearn pipeline object in pickle because I have this error :
    Copy code
    DataSetError: <class 'sklearn.pipeline.Pipeline'> was not serialised due to: Can't pickle local object 'fit_best_model.<locals>.<lambda>'
    I just return a partitioned pickle dataset like that
    return {'model_' + parameters['model']: pipeline}
    and I define the dataset in catalog.yml like that
    Copy code
    models_partionned:
      type: PartitionedDataSet
      path: data/06_models/${date}/${target}/
      filename_suffix: ".pkl"
      dataset:
        type: pickle.PickleDataSet
    m
    d
    • 3
    • 10
  • o

    Olivier Ho

    03/31/2023, 1:14 PM
    hello, two small questions: • any reason on why a
    PartitionedDataSet
    return a dictionary of callable that enable lazy loading and
    IncrementalDataSet
    which inherit for
    PartitionedDataSet
    return a dictionary of the content? • how does the
    IncrementalDataSet
    work if you use it as an input of node? I do not see the call to the
    confirm
    so I don't understand when is the checkpoint created
    d
    • 2
    • 4
  • s

    Sebastian Cardona Lozano

    04/01/2023, 2:31 PM
    Hi everyone. Has anyone used Kedro with Tensorflow? If yes, how was your experience? or maybe is better to use another framework like TFX? Thanks!
    d
    w
    • 3
    • 2
1...171819...31Latest