I have a question about this section from [the rea...
# getting-started
r
I have a question about this section from [the readme](https://datahubproject.io/docs/architecture/architecture):
Federated Metadata Serving
DataHub comes with a single metadata service (gms) as part of the open source repository. However, it also supports federated metadata services which can be owned and operated by different teams –– in fact that is how LinkedIn runs DataHub internally. The federated services communicate with the central search index and graph using Kafka, to support global search and discovery while still enabling decoupled ownership of metadata. This kind of architecture is very amenable for companies who are implementing data mesh.
Do you have an example architecture for this kind of setup? What is it about having a central metadata repository that goes against data mesh principles? Is it the downstream integrations (mce events etc.)?
s
The data infrastructure is often diverse, or even in different cloud services for larger-sized companies, and for the companies have grown by acquiring other companies. In those cases, federated metadata services bridge the metadata layer without too much infrastructure change. If you have a uniform data infrastructure, centralize metadata layer can work well.
❤️ 1
🙌 1
r
That was the missing piece for me. I guess it makes sense to start off with a single service, then! Thanks for the answer.
b
Single service first, yes!
Keep things simple for as long as possible 😛
🙌 1
m
What @shy-airline-27174 and @big-carpet-38439 said :)
❤️ 1
a
I know this is a super old thread here, but I basically have the same question, minus the part about why centralized data hub is the best first choice. Long story short, I am working with an enterprise, a very huge and diverse enterprise, that wants to implement a data mesh. The primary goal is to allow various data platforms throughout the enterprise to have a federated meta data service, with a lot of the features offered by DataHub. What's the best approach to allowing that metadata service to be deployed and managed by each individual organization, amongst many in the enterprise (can't stress the huge part enough here), all the while giving each user, of the entire enterprise, the ability to search across hundreds of data platform's metadata from a single UI?
m
Hey Ben, perhaps a few clarification questions : what specifically is the organization gaining with the ability to run separate metadata services?
a
My initial thought is that it allows for a specific team to govern their own metadata....but perhaps that line doesn't need to be drawn in this way? i.e. this can be achieved with roles and done directly in the central metadata service? - My second thought is that it makes more sense for environments that operate in various regions and for cross cloud / infrastructure communication. (But this seems more about system architecture than true user experience) So I guess the answer is, when it makes sense to do so, do it. lol
m
That is exactly why I asked that question 🙂 - we have seen repeatedly that most organizations are able to just use one scaled out DataHub metadata platform that is hosted by one team or hosted by one vendor 😉 while still having roles and visibility rules implemented appropriately.
doge 1