Kedro #questions

user

01/04/2023, 5:28 PM

mlflow cannot fetch model from model registry I have registered a model to mlflow model registry. When I call load_model function to try to fetch the model from model registry and try to make prediction, mlflow cannot find the model from the artifact path I provided, and return the following error: model_name = "sample-ann-1" version = 1 loaded_model = mlflow.pyfunc.load_model("models:/{}/{}".format(model_name, version)) "mlflow.exceptions.MlflowException: The following failures occurred while downloading one or more artifacts from...

user

01/04/2023, 5:28 PM

service XXX was unable to place a task because no container inst met all of its reqmnts. instance XXX is already using a port required by your task service crm was unable to place a task because no container instance met all of its requirements. The closest matching container-instance e45856e4821149XXXXXXXXX is already using a port required by your task. is there any way to resolve this, currently i have trying to run 4 task-definition i have referred below AWS documents not sure which solution will be ideal to resolve current issue ? dynamic porting how to do it ? registered ports : ["22","4000","2376","2375","51678","51679"] <a...

user

01/04/2023, 5:28 PM

Deployment to AWS ECS using Github Actions is failing I have written a github actions workflow yaml file by following this guide. The workflow file is added below:- name: Deploy to Staging Amazon ECS on: push: branches: - staging env: ECR_REPOSITORY: api-staging-jruby/api ECS_CLUSTER: api_staging J_RUBY_ECS_SERVICE: web-staging J_RUBY_ECS_TASK_DEFINITION:...

dor zazon

01/05/2023, 8:29 AM

hey, i am trying to find a solution for running the same pipeline with different parameters. i have a preprocess pipeline that i want to run over 5 different datasets. i want that if in the future i will have more datasets i can add them to the catalog and the pipeline will process it as well. how can i create a template pipeline and set the preprocess pipeline to run over a list of datasets names from the catalog?

Rafael Gildin

01/05/2023, 5:52 PM

Hey guys, quick question:🙂 Is there anyway to change the traceback from kedro 0.18.4., in such a way that helps the debug process*?* Problem: In the situation of accessing a non existing column from a pandas dataframe, kedro 0.17.7 shows me the correct error df[''d'] and brings the key error, but 0.18.4 brings the same key error, without saying where it was. These pictures illustrates it:

Danhua Yan

01/06/2023, 5:03 PM

Hello! Is it possible to give versioned dataset a name instead of using timestamp? I’m using datasets like

pandas.ParquetDataSet

spark.SparkDataSet

pickle.PickleDataSet

, and using yml configs to save:

Copy code

dataset:
  type: pandas.ParquetDataSet
  filepath: some_path
  versioned: true

Jaakko

01/06/2023, 6:05 PM

Say I have a dataset that contains data for many different entities and I need to create a separate model for each entity with say sklearn. So the number of models I will be creating is dynamic. What should I take into account when building pipelines to create multiple models? Should I for example use a pickle dataset to store a dictionary or list of models or something like that?

Brandon Meek

01/07/2023, 3:03 AM

Hey everyone, how can I access the current KedroSession from a node? My use case is I want to pass an unspecified number of datasets into a pipeline, the current solution I've come up with is to pass the dataset names as a list and then use the session to get the DataCatalog and read the datasets from there.

Sergei Benkovich

01/09/2023, 1:41 PM

hey i have an issue using credentials in my project. i get “ValueError: Failed to format pattern ‘${dev_s3}‘: no config value found, no default provided” credentials.yml: (in local folder), as i understand it should take the AWS_ACCESS_KEY_ID from aws cli. i tried also specifying it, and it didn’t help

Copy code

dev_s3:
     aws_access_key_id: AWS_ACCESS_KEY_ID
     aws_secret_access_key: AWS_SECRET_ACCESS_KEY

catalog.yml:

Copy code

observations:
  type: pandas.CSVDataSet
  filepath: "${s3.raw_observations_path}/commercial/observations/observations.csv"
  credentials: "${dev_s3}"

main.py

Copy code

runner = SequentialRunner()

    project_path = Path(__file__).parent.parent.parent
    conf_path = f'{project_path}/{settings.CONF_SOURCE}'
    conf_loader = CONFIG_LOADER_CLASS(conf_source=conf_path, env="local", globals_pattern='globals*')

    parameters = conf_loader.get("parameters*", "parameters*/**")
    credentials = conf_loader.get("credentials*", "credentials*/**")
    catalog = conf_loader.get("catalog*", "catalog*/**")

    data_catalog = DATA_CATALOG_CLASS(data_sets={
        'observations': CSVDataSet.from_config('observations',
                                               catalog['observations']
                                               ),
    },
        feed_dict={'params': parameters})

    result = runner.run(data_extraction.create_pipeline(), data_catalog)
    return result

settings.py

Copy code

CONF_SOURCE = "conf"

# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
    "globals_pattern": "*globals.yml",
}

# Class that manages the Data Catalog.
from <http://kedro.io|kedro.io> import DataCatalog
DATA_CATALOG_CLASS = DataCatalog

can’t get over this error…. would appreciate any help :)

Seth

01/09/2023, 2:12 PM

Hi everyone! I’d like to read existing partitions and write new partitions to the same PartitionedDataSet in a single Pipeline. However, with a single DataCatalog entry this creates a CircularDependencyError. What is the proper way to handle such situations in Kedro? I can create identical Catalog entries with different names, however it doesn’t feel like the correct solution for this problem.

Deepyaman Datta

01/09/2023, 2:55 PM

Not exactly a support question, but for people who use/have considered using

PartitionedDataSet

... Let's say I have a catalog entry like:

Copy code

my_pds:
  type: PartitionedDataSet
  path: data/01_raw/subjects
  dataset:
    type: my_project.io.MyCustomDataSet

And data like:

Copy code

data/01_raw/subjects/C001/scans/0.png
data/01_raw/subjects/C001/scans/1.png
data/01_raw/subjects/C001/scans/2.png
data/01_raw/subjects/C001/test_results.csv
data/01_raw/subjects/C001/notes.png
data/01_raw/subjects/C002/scans/0.png
data/01_raw/subjects/C002/scans/1.png
data/01_raw/subjects/C002/scans/2.png
data/01_raw/subjects/C002/test_results.csv
data/01_raw/subjects/C002/notes.png
data/01_raw/subjects/T001/scans/0.png
data/01_raw/subjects/T001/scans/1.png
data/01_raw/subjects/T001/scans/2.png
data/01_raw/subjects/T001/test_results.csv
data/01_raw/subjects/T001/notes.png

What do you think the resulting partitions would be?

Damian Fiłonowicz

01/10/2023, 9:43 AM

Hey, I have quite a large model ( > 0.5 GB) that is retrained very rarely and is located on ADLS (abfss). I would like to download it during the pipeline once, save it locally on the machine, and reuse it ➡️ WITHOUT ⬅️ redownloading from the cloud during other pipeline runs. Unfortunately, as far as I know, and we have tested, it's not possible to achieve it with CachedDataSet. Is there any way I can save some time on this operation?

dor zazon

01/10/2023, 9:56 AM

hey, i am using the latest version of Kedro (0.18.4). i am trying to use the session.load_context() function and the functions returns :

dor zazon

01/10/2023, 9:56 AM

RecursionError: maximum recursion depth exceeded while calling a Python object

dor zazon

01/10/2023, 9:56 AM

something is wrong with the after_context_created hook inside the load_context() function

Matthias Roels

01/10/2023, 2:10 PM

Quick question, we are updating from kedro 0.17.7 to 0.18.4 and we noticing that the

get_current_session

function was removed. Is there a particular reason why? And is it possible to get the same functionality differently?

Anderson Luiz Souza

01/10/2023, 3:12 PM

Hi people!! I have a question about how to save data on an Azure storage account. I am following the documentation (https://kedro.readthedocs.io/en/stable/data/data_catalog.html#example-16-loads-a-model-saved-as-a-pickle-from-azure-blob-storage), the authentication is working well, but data are not being saved in anywhere. Has anyone ever faced a similar problem? I am not sure if I am setting the data catalog properly.

Pedro Abreu

01/10/2023, 6:12 PM

Hey, we’re running kedro on a Databricks job and when hitting an exception we see: 1. the log of the job shows HTML-like traceback messages, 2. the exception doesn’t seem to be propagated appropriately to Databricks, since the job shows Status=Succeeded. This seems to be the problem raised in https://github.com/Textualize/rich/issues/2455 and initially fixed with https://github.com/kedro-org/kedro/pull/1769. However, the fix doesn’t seem to be working anymore - could it be that a change in

rich

or Databricks has invalidated the previous approach? We’re using kedro 0.18.3.

Sidharth Jahagirdar

01/10/2023, 8:53 PM

Hey team, I am trying to view the kedro viz for my pipeline. Kedro and code is present on a different cluster. How do I view the viz from my local machine. I am able to get the JSON, anything I can do to visualize it.

Dustin

01/10/2023, 10:01 PM

Happy 2023 teamThis must be a dumb question but when running

kedro viz

an error message "No such command 'viz'" shows and

kedro -h

doesn't list 'viz' option either. I followed a template project pandas-iris

Ricardo Araújo

01/10/2023, 11:52 PM

Hello everyone! A bit new to Kedro, but making progress! I wonder what the best practice is on the following scenario: I have a pipeline (not a kedro pipeline yet, just the generic concept) that process a dataset. I want to be able to run the same pipeline on different pre-specified datasets (all would be in the catalog already). These dataset are very different, and require different wrangling, filtering that are dataset specific. However, the later parts of the pipeline (modeling, evaluation) are essentially the same (the data processing transform the datasets into a common format), except for a few parameters (say, number of epochs for training a neural net is different for each dataset).

Sergei Benkovich

01/11/2023, 7:14 AM

having issues with credentials, i get following error: “‘str’ object is not a mapping. DataSet ‘observations’ must only contain arguments valid for the constructor of ‘kedro.extras.datasets.pandas.csv_dataset.CSVDataSet’.” when i try to use

Copy code

observations:
  type: pandas.CSVDataSet
  filepath: "${s3.raw_observations_path}/observations.csv"
  credentials: dev_s3

Lorenzo Castellino

01/11/2023, 8:21 AM

Hello everyone! I'm facing an issue regarding Kedro-Viz experiment tracking. I want to report results of a PCA decomposition in a Plotly graph. I prototyped the node that draws the plot in a notebook with some styling (mainly a centered title and a square aspect-ratio) with results displayed in the first image. Happy with the results I ported everything into a node that outputs the figure to a

plotly.JSONDataSet

. The pipeline runs fine, the plot is saved to disk and displayed in the experiment tracking section but the styling applied in the

fig.update_layout()

call seem to be skipped as you can see from the second image. Everything else is displayed as desired (included menu and hover data). Any clue on what could be the issue here? This is the code present in the node that outputs it:

Copy code

def plot_loadings(loadings: NDArray) -> go.Figure:
    fig = go.Figure(layout_yaxis_range=[-1, 1], layout_xaxis_range=[-1, 1])

    fig.add_traces(
        go.Scattergl(
            x=loadings[:, 0],
            y=loadings[:, 1],
            mode="markers",
            hovertext=[f"Var{i+1}" for i in range(loadings.shape[0])],
        )
    )

    x_buttons = []
    y_buttons = []

    for i in range(loadings.shape[1]):
        x_buttons.append(
            dict(
                method="update",
                label=f"PC{i + 1}",
                args=[
                    {"x": [loadings[:, i]]},
                ],
            )
        )

        y_buttons.append(
            dict(
                method="update",
                label=f"PC{i + 1}",
                args=[
                    {"y": [loadings[:, i]]},
                ],
            )
        )

    fig.update_layout(
        updatemenus=[
            dict(buttons=x_buttons, direction="up", x=0.5, y=-0.1, active=0),
            dict(
                buttons=y_buttons,
                direction="down",
                x=-0.01,
                y=0.5,
                active=(1 if loadings.shape[1] > 1 else 0),
            ),
        ]
    )

    fig.update_layout(
        {
            "title": {"text": "Loadings Plot", "x": 0.5},
            "width": 1000,
            "height": 1000,
        }
    )

    return fig

Jorge sendino

01/11/2023, 4:59 PM

Hello team! I’m facing some issues versioning SparkDataSets in azure databricks. Saving works fine, but when loading it throws a

VersionNotFoundError: Did not find any versions for SparkDataSet

. My kedro version is 0.18.3. Any idea how to solve it?

Lorenzo Castellino

01/11/2023, 5:32 PM

How do you handle nested objects in a

tracking.JSONDataSet

? I thought about (and tried) flattening the dictionary in the node. It works but is there another way I'm missing to achieving a more pleasing result?

Patrick H.

01/11/2023, 5:43 PM

Hi dear kedro community, I have a question regarding the data catalog and the data itself. So far I can only declare the data filewise in the catalog.yml file. But is it possible to declare let's say... a folder (in my case full with images of same extension)? It feels out of place to write a function in order to compress the images in one file without putting it in the pipeline, since I must read the folder. I thank you in advance for the response.

Matthias Roels

01/11/2023, 7:17 PM

I noticed there is a new

OmegaConfLoader

class available in the main branch of kedro. When is it expected to be included in a release (v0.18.5 maybe)?

user

01/11/2023, 11:48 PM

Kedro (Python) DeprecationWarning: `np.bool8` When I try to create a new kedro project or run an existing one, I get the following deprecation warning (see also screenshot below). As far as I understand the warning is neglebile, however, as I am trying to setup a clean project, I would like to resolve this warning. From the warning I get that it stems from the ploltly package which apparently uses the old np.bool8 over the new np.bool_ WARNING D\Code\Python\kedro tutorial\.venv\lib\site packages\plotly\express\imshow utils.py24:...

Javier Hernandez

01/12/2023, 1:51 PM

Hello, good afternoon, I was adding a catalog item from a database that has timeseries data and I wanted to have a dynamic param so when I run the code It retrieves the data regarding a certain day. I ran into this post: https://github.com/kedro-org/kedro/issues/1089 But I am failing to understand what is the reasoning behind. I would like to understand what would be the approach in kedro. Loading the whole table from sql and then run the transformations in code. But that would make quite inefficient when running in production.

Walber Moreira

01/12/2023, 2:21 PM

Hi, guys! We’re planning to migrate kedro 0.17.7 to latest release, there’s any planned date for the 0.19.0 release?