Armand Masseau
05/21/2025, 2:04 PMAdrien Paul
05/21/2025, 2:16 PMJonghyun Yun
05/22/2025, 2:30 PMdata/01_raw/company/cars.csv/<version>/cars.csv
) so that it could pick up correct datasets to process. Is there a way to know which <version> is being used by Kedro?Richard Asselin
05/22/2025, 2:42 PMkedro-viz
from my main Python and not the one in the virtual env (i.e., I have v11.0.0 in my main Python, but v11.0.1 in my virtual env, and running kedro viz
from within the virtual env is picking the 11.0.0 version).
Is it just something I'm doing incorrectly? Is that the expected behaviour?
Thanks!coder xu
05/28/2025, 12:22 AMDatasetError: Failed while loading data from dataset ParquetDataset(filepath=kedro/model_input_table.parquet, load_args={}, protocol=s3, save_args={}).
Expected checksum PqKP+A== did not match calculated checksum: eqRztQ==
coder xu
05/28/2025, 12:23 AMmodel_input_table:
type: pandas.ParquetDataset
filepath: <s3://kedro/model_input_table.parquet>
# type: pandas.CSVDataset
# filepath: <s3://kedro/model_input_table.csv>
and csv files is fine.Jamal Sealiti
05/28/2025, 11:32 AMJamal Sealiti
05/30/2025, 10:24 AMJamal Sealiti
05/30/2025, 12:19 PMYury Fedotov
05/30/2025, 2:27 PMTrọng Đạt Bùi
06/02/2025, 10:06 AMAnkit K
06/02/2025, 3:19 PMkedro-vertexai
plugin, version 0.10.0
) where I need to track each pipeline run in a BigQuery table. We use a table_suffix (typically a date or unique run/session ID) to uniquely identify data and outputs for each pipeline run, ensuring that results from different runs do not overwrite each other and can be traced back to a specific execution.
The challenge is that the kedro session_id
or KEDRO_CONFIG_RUN_ID
is not available at config load time, so early config logic (like setting a table_suffix) uses a date or placeholder value. This can cause inconsistencies, especially if nodes run on different days or the pipeline is resumed. (Currently pipeline takes ~2.5 days to run)
We tried generating the table_suffix using the current date at config load time, but this led to issues: if a node runs on a different day or the pipeline is resumed, a new table_suffix is generated, causing inconsistencies and making it hard to track a single pipeline run.
We also experimented with different Kedro hooks (such as before_pipeline_run and before_node_run) to set or propagate the run/session ID, but still faced challenges ensuring the value is available everywhere, including during config loading.
What is the best practice in Kedro (with Vertex AI integration) for generating and propagating a unique run/session ID that is available everywhere (including config loading and all nodes), so that all tracking and table suffixes are consistent for a given run?
Should this be set as an environment variable before Kedro starts, or is there a recommended hook or config loader pattern for this?
Any advice or examples would be appreciated!Arnout Verboven
06/03/2025, 11:00 AMlocal
and prod
), is it possible to know during pipeline creation which environment is run? Or how should I do this using proper Kedro patterns. Eg. I want to do something like:
def create_pipeline(env: str = "local") -> Pipeline:
if env == "prod":
return create_pipeline_prod()
else:
return create_pipeline_local()
Abhishek Bhatia
06/10/2025, 5:38 AMMalek Bouzidi
06/10/2025, 12:34 PMSharan Arora
06/10/2025, 5:53 PMSharan Arora
06/11/2025, 1:35 AMJonghyun Yun
06/11/2025, 9:46 PMTrọng Đạt Bùi
06/12/2025, 6:41 AMMattis
06/16/2025, 12:55 PMWejdan Bagais
06/17/2025, 4:52 PMSharan Arora
06/18/2025, 7:53 PMpostgresql_connection:
host: "${oc.env:POSTGRESQL_HOST}"
username: "${oc.env:POSTGRESQL_USER}"
password: "${oc.env:POSTGRESQL_PASSWORD}"
port: "${oc.env:POSTGRESQL_PORT}"
and each of these information are stored in a .env file in the same local
folder however when I do kedro run
postgresql_connection isn't recognized and we are unable to detect the actual values provided in the .env file that should be passed onto credentials.yml since I want this to be dynamic and based on user input. Any idea how to resolve this?
Additionally what is the process to getting kedro to read credentials.yml as well? it seems on kedro run it only cares about the catalog.yml? is it just linking credentials in catalog? i tried but then it reads the dynamic string literallyRachid Cherqaoui
06/20/2025, 11:21 AM/doc_20250620*_delta.csv
But I noticed that YAML interprets
*
as an anchor, and it doesn't seem to behave like a wildcard here.
How can I configure a dataset in catalog.yml
to use a wildcard when loading files from an SFTP path (e.g. to only fetch files starting with a certain prefix and ending with _delta.csv
)? Is there native support for this kind of pattern in Kedro's SFTPDataSet or do I need to implement a custom dataset?
Any guidance or examples would be super appreciated! 🙏Rachid Cherqaoui
06/23/2025, 7:34 AMCSVDataset
. Here's the relevant entry from my `catalog.yml`:
yaml
cool_dataset:
type: pandas.CSVDataSet
filepath:
<sftp://my-sftp-server/outbox/DW_Extracts/my_file.csv>
load_args: {}
save_args:
index: False
When I run:
python
df = catalog.load("cool_dataset")
I get the following error:
It seems like Kedro/Pandas is trying to use
ur`llib`
to open the SFTP URL, which doesn't support the
sftp://
protocol natively.
Has anyone successfully used Kedro to load files from SFTP? If so, could you share your config/setup?Adrien Paul
06/23/2025, 5:02 PMNathan W.
06/25/2025, 7:32 AM.env
or credentials.yml
and then use it in my nodes parameters to make API requests. Are there any simple solutions (without putting it in parameters.yml
and then risk to push my key into production...) I missed ?
Thanks a lot in advance for your response, Have a nice day!Fazil Topal
06/25/2025, 8:24 AMJamal Sealiti
06/26/2025, 10:14 AMRachid Cherqaoui
06/27/2025, 2:20 PMPradeep Ramayanam
06/27/2025, 5:34 PM