Hello everyone, We are actively building computati...
# random
s
Hello everyone, We are actively building computational data governance based on DataHub. So far, we have ingested about 120k datasets and estimate at least twice as many. I am curious to get some insights about a number of datasets/dashboards/ etc that different companies ingest into DataHub or your favorite data catalog. So, how big is your catalog?
g
WOW, 120K! We have ~7K, and are trying to reduce that number down a lot to get to a more consistent view of our data
s
Thanks for the response! We are trying to reduce that number too. But first, we decided to ingest all production OLAP DBs, build lineage, and then prepare cases to motivate businesses to change data architecture. We are a holding company with about 40 businesses and have a pretty weird data architecture with multiple table replicas. Now I am wondering around ~3000 tables that are replicated at least a few times each. (Max 75 replicas per table, minimum 1) I have never seen such a monster before. So, trying to put it into perspective compared to community.
r
What is the reason of having upto 75 replicas of a table?