magnificent-ocean-91922
04/27/2023, 7:28 PMid
column on the product
table was the same as the product_id
on the purchase
table, and that the customer_id
column there was the same as personid
on the customer
table, etc etc. What that then allowed for was someone could say "I'm trying to join product
and `customer`" and the app would then determine the shortest path of JOIN clauses to do that, and give them starter SQL. In this example, that would be:
SELECT *
FROM product P
JOIN purchase P2 ON P2.product_id = P.id
JOIN customer C ON C.personid = P2.customer_id
Sometimes it found inappropriate shorter paths (such as joining source systems that had no real relationship to each other, but happened to contain a column from a common place), but for the most part it was really helpful, especially when it found the complex paths that required many JOINs to get the job done (like, I don't know, joining a user action to the salesperson responsible for their access; that's a weird example and I don't think anyone would want that, but the system could do it). It's kind of like lineage, except it's lateral and not hierarchical. Anyway, it was fun to do but writing code for DataHub is outside of my wheelhouse at the moment, but if someone out there is looking for a cool little project, this might be interesting.big-carpet-38439
05/01/2023, 4:11 PMmagnificent-ocean-91922
05/01/2023, 5:12 PM