Yury Fedotov
05/14/2025, 5:46 AMMattis
05/15/2025, 8:11 AMdocker build --progress=plain --build-arg BASE_IMAGE=python:3.10.16-slim -t ABCDE.azurecr.io/kedro:latest .
And submit the job like this:
kedro azureml run -p de -s FGHZUI --aml-env kedro_env
Adrien Paul
05/15/2025, 10:09 AMkedro-azureml
plugin — every time I run the CLI, it takes around 1 minute and 10 seconds to start up.
Has anyone else experienced this slow startup behavior?
Thanks in advance!Zubin Roy
05/15/2025, 1:05 PMJonathan Dekermanjian
05/15/2025, 7:36 PMMichał Gozdera
05/19/2025, 10:48 AMspaceflights-pandas
we have info_file_handler
defined which logs into info.log
file, but when a DatasetError is raised (for example the dataset csv file is missing), it is not logged in info.log (traceback and error is visible only in console).
How to make it be logged in the log file a well? I can always define a hook like this:
class ErrorCatchLoggingHook:
@hook_impl
def on_pipeline_error(self, error: Exception):
logger.exception(f"Pipeline failed due to an error: {str(error)}")
but then the error log in the console is duplicated.juanc
05/19/2025, 1:32 PMpandas.read_html
in data catalog YAML or any other input way i'm missing out.
Thank you all.Jamal Sealiti
05/20/2025, 11:54 AMFazil Topal
05/20/2025, 8:14 PMnode(myfunc, inputs=dict(x="ds1", y=dict(subv="test1", subc="test2"))
Basically i would expect kedro to pass the resolved dict into the node as a dict. Right now it's not possible as kedro complains and wants the values as string. Not really sure how to get around thatArmand Masseau
05/21/2025, 9:32 AMArmand Masseau
05/21/2025, 2:04 PMAdrien Paul
05/21/2025, 2:16 PMJonghyun Yun
05/22/2025, 2:30 PMdata/01_raw/company/cars.csv/<version>/cars.csv
) so that it could pick up correct datasets to process. Is there a way to know which <version> is being used by Kedro?Richard Asselin
05/22/2025, 2:42 PMkedro-viz
from my main Python and not the one in the virtual env (i.e., I have v11.0.0 in my main Python, but v11.0.1 in my virtual env, and running kedro viz
from within the virtual env is picking the 11.0.0 version).
Is it just something I'm doing incorrectly? Is that the expected behaviour?
Thanks!coder xu
05/28/2025, 12:22 AMDatasetError: Failed while loading data from dataset ParquetDataset(filepath=kedro/model_input_table.parquet, load_args={}, protocol=s3, save_args={}).
Expected checksum PqKP+A== did not match calculated checksum: eqRztQ==
coder xu
05/28/2025, 12:23 AMmodel_input_table:
type: pandas.ParquetDataset
filepath: <s3://kedro/model_input_table.parquet>
# type: pandas.CSVDataset
# filepath: <s3://kedro/model_input_table.csv>
and csv files is fine.Jamal Sealiti
05/28/2025, 11:32 AMJamal Sealiti
05/30/2025, 10:24 AMJamal Sealiti
05/30/2025, 12:19 PMYury Fedotov
05/30/2025, 2:27 PMTrọng Đạt Bùi
06/02/2025, 10:06 AMAnkit K
06/02/2025, 3:19 PMkedro-vertexai
plugin, version 0.10.0
) where I need to track each pipeline run in a BigQuery table. We use a table_suffix (typically a date or unique run/session ID) to uniquely identify data and outputs for each pipeline run, ensuring that results from different runs do not overwrite each other and can be traced back to a specific execution.
The challenge is that the kedro session_id
or KEDRO_CONFIG_RUN_ID
is not available at config load time, so early config logic (like setting a table_suffix) uses a date or placeholder value. This can cause inconsistencies, especially if nodes run on different days or the pipeline is resumed. (Currently pipeline takes ~2.5 days to run)
We tried generating the table_suffix using the current date at config load time, but this led to issues: if a node runs on a different day or the pipeline is resumed, a new table_suffix is generated, causing inconsistencies and making it hard to track a single pipeline run.
We also experimented with different Kedro hooks (such as before_pipeline_run and before_node_run) to set or propagate the run/session ID, but still faced challenges ensuring the value is available everywhere, including during config loading.
What is the best practice in Kedro (with Vertex AI integration) for generating and propagating a unique run/session ID that is available everywhere (including config loading and all nodes), so that all tracking and table suffixes are consistent for a given run?
Should this be set as an environment variable before Kedro starts, or is there a recommended hook or config loader pattern for this?
Any advice or examples would be appreciated!Arnout Verboven
06/03/2025, 11:00 AMlocal
and prod
), is it possible to know during pipeline creation which environment is run? Or how should I do this using proper Kedro patterns. Eg. I want to do something like:
def create_pipeline(env: str = "local") -> Pipeline:
if env == "prod":
return create_pipeline_prod()
else:
return create_pipeline_local()
Abhishek Bhatia
06/10/2025, 5:38 AMMalek Bouzidi
06/10/2025, 12:34 PMSharan Arora
06/10/2025, 5:53 PMSharan Arora
06/11/2025, 1:35 AMJonghyun Yun
06/11/2025, 9:46 PMTrọng Đạt Bùi
06/12/2025, 6:41 AMMattis
06/16/2025, 12:55 PM