Hello I need an advice on how to manage metadata for multi t DataHub #advice-metadata-modeling

Hello, I need an advice on how to manage metadata...

most-scientist-56654

11/07/2023, 3:00 PM

Hello, I need an advice on how to manage metadata for multi-tenant environment. For example, we have N GCP projects with similarly named and structured datasets/tables, so we ingest all of them introducing N ingest sources. However this introduces a problem - now we have N similar entities in DataHub, so if we want to update a description (or any part of metadata in general) it'll require to do some repetition. From what I understand - one way to overcome this is to use

siblings

aspect. However, after looking into the source code and commit history I realized that it was introduced specifically for associating database entities with

dbt

models. So, is the

siblings

the only way to actually group similar entities and share metadata between them? Thanks.

plus1 2

curved-truck-53235

11/07/2023, 5:12 PM

Ivan datasets are identical in N projects?

most-scientist-56654

11/08/2023, 7:06 AM

> Ivan datasets are identical in N projects? Hey Igor. Yep, pretty much identical, same layout, names and schemas - different data.

curved-truck-53235

11/08/2023, 7:47 AM

And why you want to see all projects in DataHub? Seems like one is enough, imho. If all schemas updating at one moment of course.

most-scientist-56654

11/08/2023, 8:08 AM

There are some benefits of having all of them: individual table stats, recent query, profiling. But maybe you're right and we don't need all of them.

curved-truck-53235

11/08/2023, 8:15 AM

For now don't collects stats and have one object from similar databases. But I you do it looks like some custom code to collect all stats into one object (for queries we can use name to separate: proj-123234: Query1, proj-1232: Query1, etc). Maybe is not so comfortable way but if business needs - we can do.

Open in Slack

Previous Next