https://kedro.org/ logo
Join SlackCommunities
Powered by
# questions
  • u

    user

    01/04/2023, 5:28 PM
    mlflow cannot fetch model from model registry I have registered a model to mlflow model registry. When I call load_model function to try to fetch the model from model registry and try to make prediction, mlflow cannot find the model from the artifact path I provided, and return the following error: model_name = "sample-ann-1" version = 1 loaded_model = mlflow.pyfunc.load_model("models:/{}/{}".format(model_name, version)) "mlflow.exceptions.MlflowException: The following failures occurred while downloading one or more artifacts from...
  • u

    user

    01/04/2023, 5:28 PM
    service XXX was unable to place a task because no container inst met all of its reqmnts. instance XXX is already using a port required by your task service crm was unable to place a task because no container instance met all of its requirements. The closest matching container-instance e45856e4821149XXXXXXXXX is already using a port required by your task. is there any way to resolve this, currently i have trying to run 4 task-definition i have referred below AWS documents not sure which solution will be ideal to resolve current issue ? dynamic porting how to do it ? registered ports : ["22","4000","2376","2375","51678","51679"] <a...
  • u

    user

    01/04/2023, 5:28 PM
    Deployment to AWS ECS using Github Actions is failing I have written a github actions workflow yaml file by following this guide. The workflow file is added below:- name: Deploy to Staging Amazon ECS on: push: branches: - staging env: ECR_REPOSITORY: api-staging-jruby/api ECS_CLUSTER: api_staging J_RUBY_ECS_SERVICE: web-staging J_RUBY_ECS_TASK_DEFINITION:...
  • d

    dor zazon

    01/05/2023, 8:29 AM
    hey, i am trying to find a solution for running the same pipeline with different parameters. i have a preprocess pipeline that i want to run over 5 different datasets. i want that if in the future i will have more datasets i can add them to the catalog and the pipeline will process it as well. how can i create a template pipeline and set the preprocess pipeline to run over a list of datasets names from the catalog?
    m
    j
    • 3
    • 7
  • r

    Rafael Gildin

    01/05/2023, 5:52 PM
    Hey guys, quick question:🙂 Is there anyway to change the traceback from kedro 0.18.4., in such a way that helps the debug process*?* Problem: In the situation of accessing a non existing column from a pandas dataframe, kedro 0.17.7 shows me the correct error df[''d'] and brings the key error, but 0.18.4 brings the same key error, without saying where it was. These pictures illustrates it:
    m
    • 2
    • 1
  • d

    Danhua Yan

    01/06/2023, 5:03 PM
    Hello! Is it possible to give versioned dataset a name instead of using timestamp? I’m using datasets like
    pandas.ParquetDataSet
    spark.SparkDataSet
    pickle.PickleDataSet
    , and using yml configs to save:
    Copy code
    dataset:
      type: pandas.ParquetDataSet
      filepath: some_path
      versioned: true
    d
    e
    • 3
    • 7
  • j

    Jaakko

    01/06/2023, 6:05 PM
    Say I have a dataset that contains data for many different entities and I need to create a separate model for each entity with say sklearn. So the number of models I will be creating is dynamic. What should I take into account when building pipelines to create multiple models? Should I for example use a pickle dataset to store a dictionary or list of models or something like that?
    • 1
    • 1
  • b

    Brandon Meek

    01/07/2023, 3:03 AM
    Hey everyone, how can I access the current KedroSession from a node? My use case is I want to pass an unspecified number of datasets into a pipeline, the current solution I've come up with is to pass the dataset names as a list and then use the session to get the DataCatalog and read the datasets from there.
    w
    d
    i
    • 4
    • 10
  • s

    Sergei Benkovich

    01/09/2023, 1:41 PM
    hey i have an issue using credentials in my project. i get “ValueError: Failed to format pattern ‘${dev_s3}‘: no config value found, no default provided” credentials.yml: (in local folder), as i understand it should take the AWS_ACCESS_KEY_ID from aws cli. i tried also specifying it, and it didn’t help
    Copy code
    dev_s3:
         aws_access_key_id: AWS_ACCESS_KEY_ID
         aws_secret_access_key: AWS_SECRET_ACCESS_KEY
    catalog.yml:
    Copy code
    observations:
      type: pandas.CSVDataSet
      filepath: "${s3.raw_observations_path}/commercial/observations/observations.csv"
      credentials: "${dev_s3}"
    main.py
    Copy code
    runner = SequentialRunner()
    
        project_path = Path(__file__).parent.parent.parent
        conf_path = f'{project_path}/{settings.CONF_SOURCE}'
        conf_loader = CONFIG_LOADER_CLASS(conf_source=conf_path, env="local", globals_pattern='globals*')
    
        parameters = conf_loader.get("parameters*", "parameters*/**")
        credentials = conf_loader.get("credentials*", "credentials*/**")
        catalog = conf_loader.get("catalog*", "catalog*/**")
    
        data_catalog = DATA_CATALOG_CLASS(data_sets={
            'observations': CSVDataSet.from_config('observations',
                                                   catalog['observations']
                                                   ),
        },
            feed_dict={'params': parameters})
    
        result = runner.run(data_extraction.create_pipeline(), data_catalog)
        return result
    settings.py
    Copy code
    CONF_SOURCE = "conf"
    
    # Class that manages how configuration is loaded.
    from kedro.config import TemplatedConfigLoader
    CONFIG_LOADER_CLASS = TemplatedConfigLoader
    CONFIG_LOADER_ARGS = {
        "globals_pattern": "*globals.yml",
    }
    
    # Class that manages the Data Catalog.
    from <http://kedro.io|kedro.io> import DataCatalog
    DATA_CATALOG_CLASS = DataCatalog
    can’t get over this error…. would appreciate any help :)
    j
    • 2
    • 2
  • s

    Seth

    01/09/2023, 2:12 PM
    Hi everyone! I’d like to read existing partitions and write new partitions to the same PartitionedDataSet in a single Pipeline. However, with a single DataCatalog entry this creates a CircularDependencyError. What is the proper way to handle such situations in Kedro? I can create identical Catalog entries with different names, however it doesn’t feel like the correct solution for this problem.
    d
    • 2
    • 1
  • d

    Deepyaman Datta

    01/09/2023, 2:55 PM
    Not exactly a support question, but for people who use/have considered using
    PartitionedDataSet
    ... Let's say I have a catalog entry like:
    Copy code
    my_pds:
      type: PartitionedDataSet
      path: data/01_raw/subjects
      dataset:
        type: my_project.io.MyCustomDataSet
    And data like:
    Copy code
    data/01_raw/subjects/C001/scans/0.png
    data/01_raw/subjects/C001/scans/1.png
    data/01_raw/subjects/C001/scans/2.png
    data/01_raw/subjects/C001/test_results.csv
    data/01_raw/subjects/C001/notes.png
    data/01_raw/subjects/C002/scans/0.png
    data/01_raw/subjects/C002/scans/1.png
    data/01_raw/subjects/C002/scans/2.png
    data/01_raw/subjects/C002/test_results.csv
    data/01_raw/subjects/C002/notes.png
    data/01_raw/subjects/T001/scans/0.png
    data/01_raw/subjects/T001/scans/1.png
    data/01_raw/subjects/T001/scans/2.png
    data/01_raw/subjects/T001/test_results.csv
    data/01_raw/subjects/T001/notes.png
    What do you think the resulting partitions would be?
    j
    • 2
    • 2
  • d

    Damian Fiłonowicz

    01/10/2023, 9:43 AM
    Hey, I have quite a large model ( > 0.5 GB) that is retrained very rarely and is located on ADLS (abfss). I would like to download it during the pipeline once, save it locally on the machine, and reuse it ➡️ WITHOUT ⬅️ redownloading from the cloud during other pipeline runs. Unfortunately, as far as I know, and we have tested, it's not possible to achieve it with CachedDataSet. Is there any way I can save some time on this operation?
    d
    • 2
    • 4
  • d

    dor zazon

    01/10/2023, 9:56 AM
    hey, i am using the latest version of Kedro (0.18.4). i am trying to use the session.load_context() function and the functions returns :
    d
    • 2
    • 17
  • d

    dor zazon

    01/10/2023, 9:56 AM
    RecursionError: maximum recursion depth exceeded while calling a Python object
  • d

    dor zazon

    01/10/2023, 9:56 AM
    something is wrong with the after_context_created hook inside the load_context() function
  • m

    Matthias Roels

    01/10/2023, 2:10 PM
    Quick question, we are updating from kedro 0.17.7 to 0.18.4 and we noticing that the
    get_current_session
    function was removed. Is there a particular reason why? And is it possible to get the same functionality differently?
    d
    m
    • 3
    • 3
  • a

    Anderson Luiz Souza

    01/10/2023, 3:12 PM
    Hi people!! I have a question about how to save data on an Azure storage account. I am following the documentation (https://kedro.readthedocs.io/en/stable/data/data_catalog.html#example-16-loads-a-model-saved-as-a-pickle-from-azure-blob-storage), the authentication is working well, but data are not being saved in anywhere. Has anyone ever faced a similar problem? I am not sure if I am setting the data catalog properly.
    d
    • 2
    • 4
  • p

    Pedro Abreu

    01/10/2023, 6:12 PM
    Hey, we’re running kedro on a Databricks job and when hitting an exception we see: 1. the log of the job shows HTML-like traceback messages, 2. the exception doesn’t seem to be propagated appropriately to Databricks, since the job shows Status=Succeeded. This seems to be the problem raised in https://github.com/Textualize/rich/issues/2455 and initially fixed with https://github.com/kedro-org/kedro/pull/1769. However, the fix doesn’t seem to be working anymore - could it be that a change in
    rich
    or Databricks has invalidated the previous approach? We’re using kedro 0.18.3.
    d
    • 2
    • 12
  • s

    Sidharth Jahagirdar

    01/10/2023, 8:53 PM
    Hey team, I am trying to view the kedro viz for my pipeline. Kedro and code is present on a different cluster. How do I view the viz from my local machine. I am able to get the JSON, anything I can do to visualize it.
    d
    • 2
    • 15
  • d

    Dustin

    01/10/2023, 10:01 PM
    Happy 2023 teamThis must be a dumb question but when running
    kedro viz
    an error message "No such command 'viz'" shows and
    kedro -h
    doesn't list 'viz' option either. I followed a template project pandas-iris
    d
    • 2
    • 11
  • r

    Ricardo AraĂşjo

    01/10/2023, 11:52 PM
    Hello everyone! A bit new to Kedro, but making progress! I wonder what the best practice is on the following scenario: I have a pipeline (not a kedro pipeline yet, just the generic concept) that process a dataset. I want to be able to run the same pipeline on different pre-specified datasets (all would be in the catalog already). These dataset are very different, and require different wrangling, filtering that are dataset specific. However, the later parts of the pipeline (modeling, evaluation) are essentially the same (the data processing transform the datasets into a common format), except for a few parameters (say, number of epochs for training a neural net is different for each dataset).
    d
    m
    y
    • 4
    • 23
  • s

    Sergei Benkovich

    01/11/2023, 7:14 AM
    having issues with credentials, i get following error: “‘str’ object is not a mapping. DataSet ‘observations’ must only contain arguments valid for the constructor of ‘kedro.extras.datasets.pandas.csv_dataset.CSVDataSet’.” when i try to use
    Copy code
    observations:
      type: pandas.CSVDataSet
      filepath: "${s3.raw_observations_path}/observations.csv"
      credentials: dev_s3
    d
    • 2
    • 18
  • l

    Lorenzo Castellino

    01/11/2023, 8:21 AM
    Hello everyone! I'm facing an issue regarding Kedro-Viz experiment tracking. I want to report results of a PCA decomposition in a Plotly graph. I prototyped the node that draws the plot in a notebook with some styling (mainly a centered title and a square aspect-ratio) with results displayed in the first image. Happy with the results I ported everything into a node that outputs the figure to a
    plotly.JSONDataSet
    . The pipeline runs fine, the plot is saved to disk and displayed in the experiment tracking section but the styling applied in the
    fig.update_layout()
    call seem to be skipped as you can see from the second image. Everything else is displayed as desired (included menu and hover data). Any clue on what could be the issue here? This is the code present in the node that outputs it:
    Copy code
    def plot_loadings(loadings: NDArray) -> go.Figure:
        fig = go.Figure(layout_yaxis_range=[-1, 1], layout_xaxis_range=[-1, 1])
    
        fig.add_traces(
            go.Scattergl(
                x=loadings[:, 0],
                y=loadings[:, 1],
                mode="markers",
                hovertext=[f"Var{i+1}" for i in range(loadings.shape[0])],
            )
        )
    
        x_buttons = []
        y_buttons = []
    
        for i in range(loadings.shape[1]):
            x_buttons.append(
                dict(
                    method="update",
                    label=f"PC{i + 1}",
                    args=[
                        {"x": [loadings[:, i]]},
                    ],
                )
            )
    
            y_buttons.append(
                dict(
                    method="update",
                    label=f"PC{i + 1}",
                    args=[
                        {"y": [loadings[:, i]]},
                    ],
                )
            )
    
        fig.update_layout(
            updatemenus=[
                dict(buttons=x_buttons, direction="up", x=0.5, y=-0.1, active=0),
                dict(
                    buttons=y_buttons,
                    direction="down",
                    x=-0.01,
                    y=0.5,
                    active=(1 if loadings.shape[1] > 1 else 0),
                ),
            ]
        )
    
        fig.update_layout(
            {
                "title": {"text": "Loadings Plot", "x": 0.5},
                "width": 1000,
                "height": 1000,
            }
        )
    
        return fig
    d
    t
    • 3
    • 16
  • j

    Jorge sendino

    01/11/2023, 4:59 PM
    Hello team! I’m facing some issues versioning SparkDataSets in azure databricks. Saving works fine, but when loading it throws a
    VersionNotFoundError: Did not find any versions for SparkDataSet
    . My kedro version is 0.18.3. Any idea how to solve it?
  • l

    Lorenzo Castellino

    01/11/2023, 5:32 PM
    How do you handle nested objects in a
    tracking.JSONDataSet
    ? I thought about (and tried) flattening the dictionary in the node. It works but is there another way I'm missing to achieving a more pleasing result?
    t
    y
    • 3
    • 2
  • p

    Patrick H.

    01/11/2023, 5:43 PM
    Hi dear kedro community, I have a question regarding the data catalog and the data itself. So far I can only declare the data filewise in the catalog.yml file. But is it possible to declare let's say... a folder (in my case full with images of same extension)? It feels out of place to write a function in order to compress the images in one file without putting it in the pipeline, since I must read the folder. I thank you in advance for the response.
    d
    j
    • 3
    • 12
  • m

    Matthias Roels

    01/11/2023, 7:17 PM
    I noticed there is a new
    OmegaConfLoader
    class available in the main branch of kedro. When is it expected to be included in a release (v0.18.5 maybe)?
    a
    y
    m
    • 4
    • 8
  • u

    user

    01/11/2023, 11:48 PM
    Kedro (Python) DeprecationWarning: `np.bool8` When I try to create a new kedro project or run an existing one, I get the following deprecation warning (see also screenshot below). As far as I understand the warning is neglebile, however, as I am trying to setup a clean project, I would like to resolve this warning. From the warning I get that it stems from the ploltly package which apparently uses the old np.bool8 over the new np.bool_ WARNING D\Code\Python\kedro tutorial\.venv\lib\site packages\plotly\express\imshow utils.py24:...
  • j

    Javier Hernandez

    01/12/2023, 1:51 PM
    Hello, good afternoon, I was adding a catalog item from a database that has timeseries data and I wanted to have a dynamic param so when I run the code It retrieves the data regarding a certain day. I ran into this post: https://github.com/kedro-org/kedro/issues/1089 But I am failing to understand what is the reasoning behind. I would like to understand what would be the approach in kedro. Loading the whole table from sql and then run the transformations in code. But that would make quite inefficient when running in production.
    d
    • 2
    • 12
  • w

    Walber Moreira

    01/12/2023, 2:21 PM
    Hi, guys! We’re planning to migrate kedro 0.17.7 to latest release, there’s any planned date for the 0.19.0 release?
    d
    i
    • 3
    • 3
1...8910...31Latest