Has anyone developed a way to catalog *`joins`* in...
# advice-metadata-modeling
s
Has anyone developed a way to catalog
joins
in DataHub? In our enterprise environment, we have specific joins that have been created by teams and they want to catalog these “approved” join queries in the catalog solution. These are not DBMS technologies that can use any kind of query analyzer technology, and although not an analyst myself, my understanding is that for certain database technologies, such as Teradata, having a catalog of optimized joins is important to ensure partitions are being utilized correctly and resources are being used efficiently. Has anyone modeled something like this out using either custom or OOTB features in DataHub?
p
maybe catalog the database views that contain those joins?
g
There's also primary/foreign keys that you can leverage to surface some of this. Join + filter might need modification of the UI
Or leverage example queries to highlight "official" joins
s
Yeah, our business partners have made it clear that PK and FK only is not sufficient. @gorgeous-dinner-4055 are you able to direct me to any documentation about manually populating example queries? What I've found so far seems to indicate it can only be populated through sources like Snowflake, BigQuery, etc. Or are you stating this is custom functionality we’d have to code for?
cc @rhythmic-agent-54913 @nice-window-93693 @brave-pencil-21289 @dry-zoo-35797 @narrow-waitress-35309
@steep-school-85974
@gorgeous-dinner-4055 any insight as to whether we can manually populate the Queries tab?
a
@gorgeous-dinner-4055 would it be a bad idea to extend EditableSchemaMetadata to allow for “logical” foreign key constraints being added via UI?
g
I'm not a maintainer of datahub, just an advocate and implementer at our company, so take my advice with a grain of salt 😊 Re foreign key constraints being added via ui: What's the end goal? If it's just to surface info to users, it's likely ok. But imo, datahub is the wrong place to do stuff like that because your underlying platform doesn't have a way to discover these constraints or to validate if the relations are right.
Re queries tab, we push metadata using the python SDK, thats the only option afaik outside of editing the UI to enable setting of queries in the queries tab
a
Hmm I think not having the platform validate would be ok, although calling them foreign key constraints on the UI would be problematic