Has anyone developed a way to catalog `joins` in DataHub In DataHub #advice-metadata-modeling

Has anyone developed a way to catalog *`joins`* in...

swift-nail-32514

09/15/2022, 2:47 PM

Has anyone developed a way to catalog joins
in DataHub? In our enterprise environment, we have specific joins that have been created by teams and they want to catalog these “approved” join queries in the catalog solution. These are not DBMS technologies that can use any kind of query analyzer technology, and although not an analyst myself, my understanding is that for certain database technologies, such as Teradata, having a catalog of optimized joins is important to ensure partitions are being utilized correctly and resources are being used efficiently. Has anyone modeled something like this out using either custom or OOTB features in DataHub?

proud-table-38689

09/15/2022, 5:23 PM

maybe catalog the database views that contain those joins?

gorgeous-dinner-4055

09/15/2022, 6:39 PM

There's also primary/foreign keys that you can leverage to surface some of this. Join + filter might need modification of the UI

gorgeous-dinner-4055

09/15/2022, 6:39 PM

Or leverage example queries to highlight "official" joins

swift-nail-32514

09/15/2022, 7:23 PM

Yeah, our business partners have made it clear that PK and FK only is not sufficient. @gorgeous-dinner-4055 are you able to direct me to any documentation about manually populating example queries? What I've found so far seems to indicate it can only be populated through sources like Snowflake, BigQuery, etc. Or are you stating this is custom functionality we’d have to code for?

swift-nail-32514

09/19/2022, 2:49 PM

cc @rhythmic-agent-54913 @nice-window-93693 @brave-pencil-21289 @dry-zoo-35797 @narrow-waitress-35309

swift-nail-32514

09/19/2022, 2:53 PM

@steep-school-85974

swift-nail-32514

09/22/2022, 1:07 PM

@gorgeous-dinner-4055 any insight as to whether we can manually populate the Queries tab?

aloof-ram-72401

09/23/2022, 4:03 PM

@gorgeous-dinner-4055 would it be a bad idea to extend EditableSchemaMetadata to allow for “logical” foreign key constraints being added via UI?

gorgeous-dinner-4055

09/23/2022, 4:46 PM

I'm not a maintainer of datahub, just an advocate and implementer at our company, so take my advice with a grain of salt 😊 Re foreign key constraints being added via ui: What's the end goal? If it's just to surface info to users, it's likely ok. But imo, datahub is the wrong place to do stuff like that because your underlying platform doesn't have a way to discover these constraints or to validate if the relations are right.

gorgeous-dinner-4055

09/23/2022, 4:47 PM

Re queries tab, we push metadata using the python SDK, thats the only option afaik outside of editing the UI to enable setting of queries in the queries tab

aloof-ram-72401

09/23/2022, 7:34 PM

Hmm I think not having the platform validate would be ok, although calling them foreign key constraints on the UI would be problematic

2 Views

Open in Slack

Previous Next