A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Another question - any performance benchmark info about DH?

Hi Ray, We haven't really published any benchmarks because a lot of it depends on the workload, the richness of metadata you have besides the specific backends and the resources you are running DataHub with.
There is a perf test harness located at : <https://github.com/datahub-project/datahub/tree/master/perf-test> that you can use to benchmark it with your setup. There are quite a few companies in the community that are running it at scale in production with millions of entities and 10-s of millions of relationships (edges) in some cases. I think the Grab team <https://blog.datahubproject.io/humans-of-datahub-harvey-li-ebffdd021bb7|reported> ingesting their entire Hive warehouse (~80K datasets in ~ 15 mins) if memory serves me right.