Mohammed Samir
01/29/2023, 11:06 AMkedro run --env env_name
the pipelines nodes are interchangeable in running order , meaning that it runs as below
pipeline 1 --> Node 1
pipeline 2 ---> Node 1
pipeline 2 --> Node 2
pipeline 3 --> Node 1
pipeline 1 --> Node 2
pipeline 3 --> Node 2
(Note Nodes order in each pipeline is correct but kedro run a node from each pipeline)
However i want them to run in the below order,
pipeline 1 --> Node 1
pipeline 1---> Node 2
pipeline 2 --> Node 1
pipeline 2 --> Node 2
pipeline 3 --> Node 1
pipeline 3 --> Node 2
I have the following config in pipeline_registry -->
return {"__default__": pipeline1 + pipeline2+ pipeline3 + pipeline4 + pipeline5, }Rob
01/29/2023, 6:21 PMspark.yml file on the configuration folder, this to run the code from a databricks cluster (using a workflow job, so my run.py is in the DBFS), is required to specify the spark master URL?
Or is there an alternative to omit the spark.yml to let Databricks manage my configuration? (I mean, to omit the manual setting of the Master URL)
Thanks in advance!Sergei Benkovich
01/29/2023, 8:01 PMAntoine Bon
01/30/2023, 9:00 AMload_version functionality with a catalog that is build programmatically with a hook, but I fail to do so. From my understanding of the code this is not possible, and so I raised the following ticket https://github.com/kedro-org/kedro/issues/2233
Unless someone knows of a way to do so?Massinissa Saïdi
01/30/2023, 4:17 PMMassinissa Saïdi
01/30/2023, 5:34 PM--params in code with KedroSession ?
I have something like that
def get_session() -> Optional[MyKedroSession]:
bootstrap_project(Path.cwd())
try:
session = MyKedroSession.create()
except RuntimeError as exc:
<http://_log.info|_log.info>(f"Session doesn't exist, creating a new one. Raise: {exc}")
package_name = str(Path(__file__).resolve().parent.name)
session = MyKedroSession.create(package_name)
return session
def get_parameters():
context = get_session().load_context()
return context.params
But get_parameters gives the parameters set in yaml and not the updated with --params ? thx !Andrew Stewart
01/30/2023, 9:59 PM## from <https://kedro.readthedocs.io/en/stable/kedro_project_setup/session.html>
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from pathlib import Path
bootstrap_project(Path.cwd())
with KedroSession.create() as session:
session.run()
vs
## from <https://kedro.readthedocs.io/en/stable/tutorial/package_a_project.html>
from kedro_tutorial.__main__ import main
main(
["--pipeline", "__default__"]
) # or simply main() if you don't want to provide any argumentsAlexandra Lorenzo
01/31/2023, 4:48 PM"create_client() got multiple values for keyword argument 'aws_access_key_id'."
credentials.yml
dev_s3:
client_kwargs:
aws_access_key_id: AWS_ACCESS_KEY_ID
aws_secret_access_key: AWS_SECRET_ACCESS_KEY
catalog.yml
raw_images:
type: PartitionedDataSet
dataset:
type: flair_one.extras.datasets.satellite_image.SatelliteImageDataSet
credentials: dev_s3
path: <s3://ignchallenge/train>
filename_suffix: .tif
layer: raw
kedro = 0.17.7
s3fs = 0.4.2
Anyone as an idea ? Thanks in advanceJoão Areias
01/31/2023, 5:01 PMkedro jupyter convert being deprecated? And is there going to be an easy way of turning notebooks into nodes and pipelines following this decision on kedro 0.19?Elias
01/31/2023, 5:54 PMOlivia Lihn
01/31/2023, 7:28 PMAndrew Stewart
02/01/2023, 1:35 AMSebastian Cardona Lozano
02/01/2023, 4:43 AMkedro-mlflow plugin to achieve what we want. Here are the questions: Once you have the mlflow artifact can we still use the kedro-docker plugin to create the image or do we have to create the Docker image from scratch? From the other hand, can we still use the other plugins to export the pipeline to Airflow or Vertex Pipelines?
2. On that basis, we start to question if is it better to use mlflow for tracking and model registry taking advantage of the Kedro plugins, than the Vertex AI APIs. I would like to know your opinion about this or recommendations about how to combine both worlds.
Thanks in advance.
#C03RKP2LW64 #C03RKPCLYGYAnirudh Dahiya
02/01/2023, 1:14 PMException: Java gateway process exited before sending its port number
Has anyone faced this error before?Massinissa Saïdi
02/02/2023, 9:59 AMtag in code ? (kedro run --tag NAME)Larissa Siqueira
02/02/2023, 2:28 PMArtur Dobrogowski
02/02/2023, 3:58 PMdatajoely
02/02/2023, 3:58 PMFilip Panovski
02/02/2023, 5:01 PMdask.yml in my conf/base which contains the following (real config is much larger, but this gets the point across):
dask_cloudprovider:
region: eu-central-1
instance_type: t3.xlarge
n_workers: 36
And a dask.yml in another environment, e.g. conf/low with the following:
dask_cloudprovider:
instance_type: t3.small
n_workers: 8
Which I activate using kedro run --env=low.
Now, I would have expected the config_loader (TemplatedConfigLoader) to contain something like {'dask_cloudprovider': {'region: 'eu-central-1', 'instance_type': 't3.small', 'n_workers': 8}} .
However, it overrides the entire entry, resulting in the config_loader containing: {'dask_cloudprovider': {'instance_type': 't3.small', 'n_workers': 8}} .
Is there any way to get what I was expecting out of the box? I don't really want to copy my entire configuration N-times for each environment, especially since only a few of the keys change. Is the intended use case for environments different to what I'm trying to use it for (say, only for top-level entries)?WEN XIN (Jessie 文馨)
02/03/2023, 4:47 AMspark job to EMR through livy for a kedro project?Evžen Šírek
02/03/2023, 10:01 AMfastparquet engine with the ParquetDataSet?
There is a possibility to specify the engine in the catalog entry:
dataset:
type: pandas.ParquetDataSet
filepath: data/dataset.parquet
load_args:
engine: fastparquet
save_args:
engine: fastparquet
However, when I do that, I get the DataSetError with I/O operation on closed file when Kedro tries to save the dataset.
When I manually save the data with pandas and engine=fastparquet (which is what Kedro should do according to the docs), it works well.
Is this expected? Thanks! :))
Environment:
python==3.10.4, pandas==1.5.1, kedro==0.18.4, fastparquet==2023.1.0Massinissa Saïdi
02/03/2023, 10:45 AMVeenu Yadav
02/03/2023, 1:18 PMGiven configuration path either does not exist or is not a valid directory: /usr/local/airflow/conf/base while deploying Kedro pipeline on on Apache Airflow with Astronomer . Any clues?Veenu Yadav
02/03/2023, 1:20 PM/usr/local/airflow/conf/base is not even present in webserver container.Sergei Benkovich
02/03/2023, 3:29 PMRafał Nowak
02/05/2023, 6:54 PMjson.JSONDataSet supporting gzip compression, so the filepath would be *.json.gz
I haven’t found such backend in kedro.datasets
Have anyone already implemented such dataset?Sergei Benkovich
02/05/2023, 8:05 PMModuleNotFoundError: No module named 'pipelines'
any suggestions on how to handle it?Ankar Yadav
02/06/2023, 12:19 PMsep in save_args, it gives me an error:
prm_customer:
type: pandas.CSVDataSet
filepath: ${base_path}/${folders.prm}/
save_args:
index: False
sep: "|"
Any idea how to fix this?
I am using kedro 0.18.1Yanni
02/06/2023, 1:59 PMDebanjan Banerjee
02/06/2023, 2:03 PM