hey team, just have a question. We are ingesting m...
# ingestion
m
hey team, just have a question. We are ingesting metadata from Snowflake, Tableau, DBT. When we extract the metadata, dataset names are in different casing convention, some UPPER, some lower and some mixed cases (because user input). This affects the lineage graph, is there any way to convert to consistent casing?
🤔 3
I'm thinking using of transformer to do it, doesn't seem like a good way. DBT doesn't transform
source
.
m
Hi Steve, this issue has cropped up a few times, we should solve it in a consistent way. We'll respond in a day or two. Just a heads up, the core team is at an offsite for the next couple of days, so responses will be a little slower. 🙏
Hey @modern-artist-55754 just giving you an update here, we are planning to preserve case in all cases. However, you mentioned that there are cases where different systems are storing the names in different cases due to manual input. Could you explain more how that happens?
m
@mammoth-bear-12532 so we are ingesting meta from Tableau, Snowflake and DBT. The default behaviour for Snowflake ingestion is to have all the datasets (fully qualified name
db.schema.table
) in lower case (if table name is not case sensitive). with DBT, the ingestion take the exact case from catalog.json and manifest.json, Tableau seems to have everything in uppercase. What we find is that the lineage is not correctly shown because the URN are different i.e.
urn:li:dataset:(urn:li:platform:snowflake,DB.SCHEMA.TABLE,PROD), urn:li:dataset:(urn:li:platform:snowflake,DB.SCHEMA.table,PROD) and urn:li:dataset:(urn:li:platform:snowflake,db.schema.table,PROD)
I believe there was a PR to make
db.schema
case insensitive, but the table name is still case sensitive. More to that, the front end now show 3 different dataset. If we can have an option in the yml config to specify what type of case we want for the urn, that would be more flexible?
@mammoth-bear-12532 Do you have any suggestion for our case? Right now i have a transformer to transform the urn of the DatasetSnapshotClass & UpstreamLineageClass to lower case