Hello, I need an advice on how to manage metadata...
# advice-metadata-modeling
m
Hello, I need an advice on how to manage metadata for multi-tenant environment. For example, we have N GCP projects with similarly named and structured datasets/tables, so we ingest all of them introducing N ingest sources. However this introduces a problem - now we have N similar entities in DataHub, so if we want to update a description (or any part of metadata in general) it'll require to do some repetition. From what I understand - one way to overcome this is to use
siblings
aspect. However, after looking into the source code and commit history I realized that it was introduced specifically for associating database entities with
dbt
models. So, is the
siblings
the only way to actually group similar entities and share metadata between them? Thanks.
plus1 2
c
Ivan datasets are identical in N projects?
m
> Ivan datasets are identical in N projects? Hey Igor. Yep, pretty much identical, same layout, names and schemas - different data.
c
And why you want to see all projects in DataHub? Seems like one is enough, imho. If all schemas updating at one moment of course.
m
There are some benefits of having all of them: individual table stats, recent query, profiling. But maybe you're right and we don't need all of them.
c
For now don't collects stats and have one object from similar databases. But I you do it looks like some custom code to collect all stats into one object (for queries we can use name to separate: proj-123234: Query1, proj-1232: Query1, etc). Maybe is not so comfortable way but if business needs - we can do.