Hi All! Is there a way to configure custom “cluste...
# ingestion
r
Hi All! Is there a way to configure custom “cluster” name for Airflow metadata ingestion? I’ve successfully set up airflow to send metadata and lineage information following the docs here with default config https://datahubproject.io/docs/metadata-ingestion#setting-up-airflow-to-use-datahub-as-lineage-backend, but now it looks like “cluster” part of urn defaults to “prod” and we would like to configure this to be unique per Airflow environment
h
Sort of, but no 😅 So when the dataflow urn is created in the lineage backed it's done by a function that takes the cluster as a input, but the default values is "prod". The lineage backend doesnt actually send the value though, so the default is used. This should be a simple thing to fix, though.
r
Hi Fredrik! Thank you for reply and comment on the matter, I had the same impression looking at the source code of Airflow data hub provider (https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub_provider/_lineage_core.py#L61) - probably will request it to be configurable this in the feature-requests channel
h
yup, exactly ☝️
b
Thanks the for the question Donatas - what Fredrik mentioned is spot on. This is something we dont want hardcoded but it seems it hasn't been prioritized yet. Is this a blocker for you?
r
Hi John, thanks for reply! I don’t think this is blocker (assuming there are no exactly similar DAGs between our Airflow instances) and we are in progress enabling Airflow lineage either way. Really happy though this has been noticed and would be improved going further
b
Absolutely. cc @little-megabyte-1074 for visibility + tracking 🙂
thanks ewe 1
l
@ripe-furniture-93265 thanks so much for your feedback! I’m going to cross-post this in #feature-requests
🙌 1