Jamal Sealiti
05/28/2025, 11:32 AMJamal Sealiti
05/30/2025, 10:24 AMJamal Sealiti
05/30/2025, 12:19 PMYury Fedotov
05/30/2025, 2:27 PMTrọng Đạt Bùi
06/02/2025, 10:06 AMAnkit K
06/02/2025, 3:19 PMkedro-vertexai
plugin, version 0.10.0
) where I need to track each pipeline run in a BigQuery table. We use a table_suffix (typically a date or unique run/session ID) to uniquely identify data and outputs for each pipeline run, ensuring that results from different runs do not overwrite each other and can be traced back to a specific execution.
The challenge is that the kedro session_id
or KEDRO_CONFIG_RUN_ID
is not available at config load time, so early config logic (like setting a table_suffix) uses a date or placeholder value. This can cause inconsistencies, especially if nodes run on different days or the pipeline is resumed. (Currently pipeline takes ~2.5 days to run)
We tried generating the table_suffix using the current date at config load time, but this led to issues: if a node runs on a different day or the pipeline is resumed, a new table_suffix is generated, causing inconsistencies and making it hard to track a single pipeline run.
We also experimented with different Kedro hooks (such as before_pipeline_run and before_node_run) to set or propagate the run/session ID, but still faced challenges ensuring the value is available everywhere, including during config loading.
What is the best practice in Kedro (with Vertex AI integration) for generating and propagating a unique run/session ID that is available everywhere (including config loading and all nodes), so that all tracking and table suffixes are consistent for a given run?
Should this be set as an environment variable before Kedro starts, or is there a recommended hook or config loader pattern for this?
Any advice or examples would be appreciated!Arnout Verboven
06/03/2025, 11:00 AMlocal
and prod
), is it possible to know during pipeline creation which environment is run? Or how should I do this using proper Kedro patterns. Eg. I want to do something like:
def create_pipeline(env: str = "local") -> Pipeline:
if env == "prod":
return create_pipeline_prod()
else:
return create_pipeline_local()
Abhishek Bhatia
06/10/2025, 5:38 AMMalek Bouzidi
06/10/2025, 12:34 PMSharan Arora
06/10/2025, 5:53 PMSharan Arora
06/11/2025, 1:35 AMJonghyun Yun
06/11/2025, 9:46 PMTrọng Đạt Bùi
06/12/2025, 6:41 AMMattis
06/16/2025, 12:55 PMWejdan Bagais
06/17/2025, 4:52 PMSharan Arora
06/18/2025, 7:53 PMpostgresql_connection:
host: "${oc.env:POSTGRESQL_HOST}"
username: "${oc.env:POSTGRESQL_USER}"
password: "${oc.env:POSTGRESQL_PASSWORD}"
port: "${oc.env:POSTGRESQL_PORT}"
and each of these information are stored in a .env file in the same local
folder however when I do kedro run
postgresql_connection isn't recognized and we are unable to detect the actual values provided in the .env file that should be passed onto credentials.yml since I want this to be dynamic and based on user input. Any idea how to resolve this?
Additionally what is the process to getting kedro to read credentials.yml as well? it seems on kedro run it only cares about the catalog.yml? is it just linking credentials in catalog? i tried but then it reads the dynamic string literallyRachid Cherqaoui
06/20/2025, 11:21 AM/doc_20250620*_delta.csv
But I noticed that YAML interprets
*
as an anchor, and it doesn't seem to behave like a wildcard here.
How can I configure a dataset in catalog.yml
to use a wildcard when loading files from an SFTP path (e.g. to only fetch files starting with a certain prefix and ending with _delta.csv
)? Is there native support for this kind of pattern in Kedro's SFTPDataSet or do I need to implement a custom dataset?
Any guidance or examples would be super appreciated! 🙏Rachid Cherqaoui
06/23/2025, 7:34 AMCSVDataset
. Here's the relevant entry from my `catalog.yml`:
yaml
cool_dataset:
type: pandas.CSVDataSet
filepath:
<sftp://my-sftp-server/outbox/DW_Extracts/my_file.csv>
load_args: {}
save_args:
index: False
When I run:
python
df = catalog.load("cool_dataset")
I get the following error:
It seems like Kedro/Pandas is trying to use
ur`llib`
to open the SFTP URL, which doesn't support the
sftp://
protocol natively.
Has anyone successfully used Kedro to load files from SFTP? If so, could you share your config/setup?Adrien Paul
06/23/2025, 5:02 PMNathan W.
06/25/2025, 7:32 AM.env
or credentials.yml
and then use it in my nodes parameters to make API requests. Are there any simple solutions (without putting it in parameters.yml
and then risk to push my key into production...) I missed ?
Thanks a lot in advance for your response, Have a nice day!Fazil Topal
06/25/2025, 8:24 AMJamal Sealiti
06/26/2025, 10:14 AMRachid Cherqaoui
06/27/2025, 2:20 PMPradeep Ramayanam
06/27/2025, 5:34 PMRachid Cherqaoui
06/30/2025, 9:11 AM.txt
file generated by a Kedro pipeline that I created, and I'd like to send it to a folder on a remote server via SFTP.
After several attempts, I found it quite tricky to handle this cleanly within Kedro, especially while keeping things consistent with its data catalog and hooks system.
Would anyone be able to help or share best practices on how to achieve this with Kedro?
Thanks in advance for your support!Jamal Sealiti
06/30/2025, 11:29 AMolufemi george
07/02/2025, 4:52 PMminmin
07/03/2025, 12:51 PMmodel_1.mae:
type: <http://kedro_mlflow.io|kedro_mlflow.io>.metrics.MlflowMetricDataset
model_2.mae:
type: <http://kedro_mlflow.io|kedro_mlflow.io>.metrics.MlflowMetricDataset
if however i try and template the name in the catalog it fails:
"{model_name}.mae":
type: <http://kedro_mlflow.io|kedro_mlflow.io>.metrics.MlflowMetricDataset
I get the error message:
DatasetError: Failed while saving data to dataset
MlflowMetricDataset(run_id=...).
Invalid value null for parameter 'name' supplied: Metric name cannot be None. A key name must be provided.
do I just have to avoid templating in the catalog when it comes to mlflow related entries?Adrien Paul
07/04/2025, 8:42 AMjulie tverfjell
07/04/2025, 10:20 AM