Another question - any performance benchmark info ...
# getting-started
g
Another question - any performance benchmark info about DH?
m
Hi Ray, We haven't really published any benchmarks because a lot of it depends on the workload, the richness of metadata you have besides the specific backends and the resources you are running DataHub with. There is a perf test harness located at : https://github.com/datahub-project/datahub/tree/master/perf-test that you can use to benchmark it with your setup. There are quite a few companies in the community that are running it at scale in production with millions of entities and 10-s of millions of relationships (edges) in some cases. I think the Grab team reported ingesting their entire Hive warehouse (~80K datasets in ~ 15 mins) if memory serves me right.
g
thx