Hi I ll be implementing DataHub my infrastructure will conta DataHub #all-things-deployment

Hi, I'll be implementing DataHub, my infrastructur...

red-window-75368

01/17/2022, 2:18 PM

Hi, I'll be implementing DataHub, my infrastructure will contain various data lakes, each data lake will have a different owner, I want that anyone can access the metadata from all of these data lakes (each data lake owner should be able to ingest metadata only to their own section, shouldn't be able to change other data lakes' metadata in datahub and should be able to see all the metadata from all data lakes) so I thought of a centralized DataHub, but I read about Federated Metadata Serving in DataHub website and I am trying to grasp this concept and want to know what are the advantages of implementing this option instead of just ingesting the metadata from all the data lake into one DataHub. Also want to know if there is any information on how to implement this federated metadata serving. Thank you.

mammoth-bear-12532

01/17/2022, 7:56 PM

Hi @red-window-75368! Thanks for the question and sorry for not replying to your previous attempt at asking the same question 🙂

mammoth-bear-12532

01/17/2022, 7:58 PM

I think if your goal is primarily to prevent writers from overwriting metadata in domains they don't own, while being able to see across all domains and explore, you would be fine implementing a central (one) DataHub strategy.

mammoth-bear-12532

01/17/2022, 8:00 PM

What you could do from an operational perspective is segregate the input pipes coming into the central DataHub by using Kafka in-front of DataHub. This is sometimes useful when you don't want remote http connections going from different environments into a single central service. Also helps with decoupling the availability of the metadata stream from the metadata service.

mammoth-bear-12532

01/17/2022, 8:01 PM

depending on whether your company has a multi-env deployment of Kafka, you could opt for this option:

Copy code

[Data Lake A] -> [Kafka local to A] -> (mirror-maker) [Kafka aggregate] -> [DataHub central]

mammoth-bear-12532

01/17/2022, 8:02 PM

Happy to hop on a call if you'd like to discuss this pattern further!

3 Views

Open in Slack

Previous Next