I have a question about this section from the readme <https DataHub #getting-started

I have a question about this section from [the rea...

rapid-sundown-8805

07/07/2021, 1:18 PM

I have a question about this section from [the readme](https://datahubproject.io/docs/architecture/architecture):

Federated Metadata Serving

DataHub comes with a single metadata service (gms) as part of the open source repository. However, it also supports federated metadata services which can be owned and operated by different teams –– in fact that is how LinkedIn runs DataHub internally. The federated services communicate with the central search index and graph using Kafka, to support global search and discovery while still enabling decoupled ownership of metadata. This kind of architecture is very amenable for companies who are implementing data mesh.

Do you have an example architecture for this kind of setup? What is it about having a central metadata repository that goes against data mesh principles? Is it the downstream integrations (mce events etc.)?

shy-airline-27174

07/07/2021, 1:51 PM

The data infrastructure is often diverse, or even in different cloud services for larger-sized companies, and for the companies have grown by acquiring other companies. In those cases, federated metadata services bridge the metadata layer without too much infrastructure change. If you have a uniform data infrastructure, centralize metadata layer can work well.

❤️ 1

🙌 1

rapid-sundown-8805

07/07/2021, 2:01 PM

That was the missing piece for me. I guess it makes sense to start off with a single service, then! Thanks for the answer.

big-carpet-38439

07/07/2021, 2:05 PM

Single service first, yes!

big-carpet-38439

07/07/2021, 2:06 PM

Keep things simple for as long as possible 😛

🙌 1

mammoth-bear-12532

07/07/2021, 2:32 PM

What @shy-airline-27174 and @big-carpet-38439 said :)

❤️ 1

acceptable-whale-38210

12/12/2023, 3:40 AM

I know this is a super old thread here, but I basically have the same question, minus the part about why centralized data hub is the best first choice. Long story short, I am working with an enterprise, a very huge and diverse enterprise, that wants to implement a data mesh. The primary goal is to allow various data platforms throughout the enterprise to have a federated meta data service, with a lot of the features offered by DataHub. What's the best approach to allowing that metadata service to be deployed and managed by each individual organization, amongst many in the enterprise (can't stress the huge part enough here), all the while giving each user, of the entire enterprise, the ability to search across hundreds of data platform's metadata from a single UI?

mammoth-bear-12532

12/12/2023, 7:13 AM

Hey Ben, perhaps a few clarification questions : what specifically is the organization gaining with the ability to run separate metadata services?

acceptable-whale-38210

12/12/2023, 3:42 PM

My initial thought is that it allows for a specific team to govern their own metadata....but perhaps that line doesn't need to be drawn in this way? i.e. this can be achieved with roles and done directly in the central metadata service? - My second thought is that it makes more sense for environments that operate in various regions and for cross cloud / infrastructure communication. (But this seems more about system architecture than true user experience) So I guess the answer is, when it makes sense to do so, do it. lol

mammoth-bear-12532

12/12/2023, 8:07 PM

That is exactly why I asked that question 🙂 - we have seen repeatedly that most organizations are able to just use one scaled out DataHub metadata platform that is hosted by one team or hosted by one vendor 😉 while still having roles and visibility rules implemented appropriately.

doge 1

4 Views

Open in Slack

Previous Next