A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Hi. I have sort of a deployment question. We have data and datasets in multiple AWS Regions. Currently it is painful to do search &amp; discovery of data &amp; datasets since we have to log in to each region using our current home grown approach (don’t really want to call it a data catalog since it isn’t). What I want is to provide a global view. Eventually consistent is fine. So I’m wondering if anyone has tried deploying DH in multiple regions with some kind of sync set up?

Hi Ray, in this case all you need to do is run the `datahub ingestion` in each region and send the events over to a central datahub instance over http(s). Would that work for your setup?

But over HTTPS seems like it would be slow, in terms of throughput. I’m new to DH though and maybe there’s some batching API where more than one item can be sent per HTTP call? Given Kafka is core to DH persistence (given my brief experience) what about something like MirrorMaker or Confluent Replicator?