Greetings Programs! I'm a bit puzzled by the term...
# advice-metadata-modeling
m
Greetings Programs! I'm a bit puzzled by the terms 'DataPlatform' and 'DataPlatformInstance'. The documentation states:
Data Platforms are systems or tools that contain Datasets, Dashboards, Charts, and all other kinds of data assets modeled in the metadata graph.
Examples of data platforms are
redshift
,
hive
,
bigquery
,
looker
,
tableau
etc.
There does not appear to be any documentation on what a 'DataPlatformInstance' is. So looking at the examples, I'd say a DataPlatform is a technology, and my guess is that a DataPlatformInstance is an actual manifestation, like a single kafka cluster, a specific tableau account(?). So I was wondering how to model the data within a large company, where you have several business units, some of which have their own 'platforms', where in this case a platform is a collection of tools/systems. These platforms have a kafka cluster, a hive db, a spark cluster, one or more rdbms's... They are usually referred to with a single name, as in "I extracted this report from the ACME platform yesterday", the actual technology that was used is irrelevant. Some of these platforms are used only within the business unit, some of them are setup to be shared among them. These platforms do not seem to fit the concept of 'DataPlatform' or 'DataPlatformInstance'. So how would you model them?
• dataplatforminstance docs: https://datahubproject.io/docs/platform-instances/ • looks like 'Domains' might be an option. https://datahubproject.io/docs/domains/