charles
04/20/2023, 12:35 PMlocals/parameters.yml file?
catalog:
parsed_documents:  # Just one document for now.
  type: json.JSONDataSet
  filepath: '<s3://mybucket/${env}/myjson.json>
local/parameters.yml file entry: env: "main"
in kedro ipython trying to load i am getting:
DataSetError: Failed while loading data from data set JSONDataSet(filepath=mybucket/${env}/myjson.json, protocol=s3, save_args={'indent': 2}).
mybucket/${env}/myjson.jsonLeo Cunha
04/20/2023, 12:56 PMcli.py ?Merel
04/20/2023, 3:17 PMpyspark 3.4.0 was released on the 13th of April and has broken our pyspark-iris.
I’ve written up my findings so far in an issue: https://github.com/kedro-org/kedro-starters/issues/123 but it could be I’ve been approaching this all wrong and  I’ve now reached the point where I could really use some help figuring out what is going on 🙏Beltra909
04/21/2023, 7:06 AMDataSetError: Failed while loading data from data set
ParquetDataSet(filepath=<my file_path>,
load_args={'engine': pyarrow}, protocol=s3, save_args={'engine': pyarrow}).
AioSession.__init__() got an unexpected keyword argument 'target_options' . I have tried with different versions of fsspec, s3fs, kedro and python and I get the same issue. Here is what I am using currently: Python 3.10.10, Kedro 0.18.7, s3fs 2023.3.0, fsspec 2023.3.0, aiobotocore 2.4.2, pandas 1.5.3. Pip check does not show any broken requirements. Has anyone experienced this problem before? Extensive googling didn't show any result....Si Yan
04/21/2023, 8:11 PMRob
04/22/2023, 6:09 PMstorage_type , this is my how globals YAML looks like:
storage_mode: "local"
storage:
  local: "data/"
  gcp: "<gs://my-bucket/data/>"
data:
  {% if storage_mode == 'local' %}
  storage_type: ${storage.local}
  {% elif storage_mode == 'gcp' %}
  storage_type: ${storage.gcp}
  {% endif %}
  player_tags: ${storage_type}/01_player_tags
  raw_battlelogs: ${storage_type}/02_raw_battlelogs
  raw_metadata: ${storage_type}/03_raw_metadata
  enriched_data: ${storage_type}/04_enriched_data
  curated_data: ${storage_type}/05_curated_data
  viz_data: ${storage_type}/06_viz_data
  feature_store: ${storage_type}/07_feature_store
  model_registry: ${storage_type}/08_model_registry
I'm not familiar with this type of syntax, and I'm getting a ScannerErrorJason
04/24/2023, 1:33 PMdataset1
|--01_raw
|--02_intermediate
|--03_primary
|--...
dataset2
|--01_raw
|--02_intermediate
|--03_primary
|--...Giulio Morina
04/25/2023, 10:51 AMBalazs Konig
04/25/2023, 4:49 PMClaire BAUDIER
04/26/2023, 8:47 AMparams », but using a file different from the default parameters.yml file. Here is what I have in mind based on one of the documentation examples:
from kedro.config import ConfigLoader
from kedro.framework.project import settings
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")
params = conf_loader.get(« other_parameters_file.yml")
# in node definition
def increase_volume(volume, step):
  return volume + step
# in pipeline definition
node(
  func=increase_volume,
  inputs=["input_volume", "params:step_size"],
  outputs="output_volume",
)
And the parameter step_size  would be in the other_parameters_file.yml. My question is to know whether it is feasible with kedro to do that ? If so, how should it be done ?
Thanks a lot for your help !Iñigo Hidalgo
04/26/2023, 3:16 PMsimple_conn_pt_model_filter_predict:
    date_column: date
    window_length: 0d
    gap: 0d
    check_groups: null
    continue_if_missing: true
I am trying to edit the parameter gap  through kedro run --pipeline ... --params=..., but I need to overwrite the whole dictionaryJuan Diego
04/26/2023, 3:42 PMkedro package? It will be useful when used to raise an error when doesn’t meet the one expected for a launcher.Agnaldo Luiz
04/27/2023, 12:04 PM#credentials.yml
win_user: 'user01'
#catalog.yml
data:
    type: pandas.ExcelDataSet
    filepath: C:\Users\${win_user}\data.xlsxRishabh Kasat
04/27/2023, 2:08 PMkedro.framework.cli.utils.KedroCliError: No module named 'pyspark_llap'
Run with --verbose to see the full exception
Error: No module named 'pyspark_llap'Season Yang
04/27/2023, 4:03 PMipython and would love to get help from the team.   Under the same release 0.18.7 for both kedro and kedro-starter with python 3.8, kedro provides ipython~=8.1 (https://github.com/kedro-org/kedro/blob/main/test_requirements.txt#L22) while kedro-starter’s pyspark restrict  ipython>=7.31.1, <8.0 (https://github.com/kedro-org/kedro-starters/blob/main/pyspark/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/requirements.txt#L3)
Would really appreciate any help on this! Thank you in advance!Kelsey Sorrels
04/27/2023, 10:56 PMJo Stichbury
04/28/2023, 4:11 PMDarshan
04/29/2023, 5:55 AMRob
04/29/2023, 10:01 PMcatalog.yml for a parquet of type spark.SparkDataSet?
I'm trying to use the .json file from Google Cloud but I'm having problems not knowing how to define it in the catalog
Thanks in advance 🙂Darshan
04/30/2023, 6:51 AMcompanies:
  type: pandas.CSVDataSet
  filepath: s3://<your-bucket>/companies.csv
This is a sample provided by Kedro with the aws step function, might be useful.Sebastian Cardona Lozano
05/01/2023, 5:03 PMVandana Malik
05/02/2023, 9:34 AMHOOKS = (ProjectHooks(),DataValidationHook())
CONTEXT_CLASS = ProjectContext
context.py-
class ProjectContext(KedroContext):
    """Project context.
    Users can override the remaining methods from the parent class here,
    or create new ones (e.g. as required by plugins)
    """
    hooks = ProjectHooks()
    def __init__(
        self,
        package_name: str,
        project_path: Union[Path, str],
        env: str = None,
        extra_params: Dict[str, Any] = None,
    ):
        """Init class."""
        super().__init__(package_name, project_path, env, extra_params)
        self.hooks = DataValidationHook()
        self._spark_session = None
        self._experiment_tracker = None
        self._setup_env_variables()
        self._init_common_env_vars()
        self.init_spark_session()
Can you guide me where I can look or modify in order to check why hooks are not runningJordan
05/02/2023, 11:16 AM%load_ext kedro.ipython .
However, in a standalone file when I am creating the catalog as follows:
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
project_path = Path(".").resolve()
metadata = bootstrap_project(project_path)
with KedroSession.create(metadata.package_name, project_path) as session:
    context = session.load_context()
    catalog = context.catalog
data = catalog.load("my_metrics")
I get the following error:
DataSetError: Loading not supported for 'MetricsDataSet'
If this is true, why does it load in a notebook?Adrien
05/02/2023, 11:34 AM<http://com.google.cloud.ai|com.google.cloud.ai>.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: <http://aiplatform.googleapis.com/custom_model_training_cpus|aiplatform.googleapis.com/custom_model_training_cpus>, cause=null; Failed to create custom job for the task. Task: Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726; Failed to create external task or refresh its state. Task:Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726; Failed to handle the pipeline task. Task: Project number: 496232377396, Job id: 1189445081858310144, Task id: 6444159035313750016, Task name: preprocess-shuttles-node, Task state: DRIVER_SUCCEEDED, Execution name: projects/496232377396/locations/europe-west1/metadataStores/default/executions/14295685814278275726
I check the quotas specified but it's not the problem because it's set to 1 and I specify 0.2 cpus for each node (kedro vertexai starter guide). I think it come from gcp but i know know witch configuration to update.
Someone has an explaination / face the same bug ? I'm on this issue for days and i can't find the solution...Thaiza
05/02/2023, 11:54 AMAfaque Ahmad
05/02/2023, 11:59 AMv0.16.x to 0.18.7. Is there a checklist of steps that I can follow for a smooth migration?fmfreeze
05/02/2023, 5:22 PMFlavien
05/03/2023, 10:26 AMkedro project on Databricks (and have good hope to convince my team to go for kedro). The documentation is very well written, thanks for that. Scrolling through the messages in Slack, I did not find a way to directly use the object spark, the SparkSession provided directly in the Databricks notebooks. Is there any way to do so?Vandana Malik
05/03/2023, 10:37 AMimport os
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import SequentialRunner
from hooks import ControlTableHooks
if __name__ == "__main__":
    bootstrap_project(os.path.abspath(os.environ.get("PROJECT_PATH")))
    os.chdir(os.environ.get("PROJECT_PATH"))
    with KedroSession.create(env=os.environ.get("kedro_environment")) as session:
        runner = SequentialRunner()
        context = session.load_context()
        pipeline = context.pipelines[os.environ.get("pipeline_name")]
        catalog = context.catalog
        runner.run(pipeline, catalog)
        result_dict = {"message": "Success"}
any helpPavan Naidu
05/03/2023, 10:10 PM