Hi folks, We're looking to get started with DataHu...
# advice-metadata-modeling
b
Hi folks, We're looking to get started with DataHub and we had some questions around modeling some of our entities that I was hoping to clarify: • Are some of the DataPlatform entities (e.g. the ones in examples/mce_files/data_platforms.json) ingested by default or do we need to setup a task on our end to ingest the DataPlatforms that we care about (Hive, Trino to start). Looking at bootstrap_mce.json it looks like there are DataPlatformSnapshot events being sent so I suspect the answer is yes? • We have a container for our Datasets at Stripe that is a namespace which we use to capture some common pieces of metadata (like ACLs that apply to all datasets in the namespace). I was wondering if the browsePaths aspect is a reasonable way to model the namespace container. Essentially if I have a namespace 'foo' with dataset 'bar', I wondering if I can add a browsePaths aspect to bar with path: "/foo/bar". I'm thinking if we went this route we'd forward the ACL information to a dedicated aspect at the dataset level and the path is just used as a container. Or is it a better option to create an Entity type called 'Namespace'?
m
Hey Piyush:
• Are some of the DataPlatform entities (e.g. the ones in examples/mce_files/data_platforms.json) ingested by default or do we need to setup a task on our end to ingest the DataPlatforms that we care about (Hive, Trino to start). Looking at bootstrap_mce.json it looks like there are DataPlatformSnapshot events being sent so I suspect the answer is yes?
yes default data platforms (which are actually here ) are ingested on boot by the service.
• We have a container for our Datasets at Stripe that is a namespace which we use to capture some common pieces of metadata (like ACLs that apply to all datasets in the namespace). I was wondering if the browsePaths aspect is a reasonable way to model the namespace container. Essentially if I have a namespace 'foo' with dataset 'bar', I wondering if I can add a browsePaths aspect to bar with path: "/foo/bar". I'm thinking if we went this route we'd forward the ACL information to a dedicated aspect at the dataset level and the path is just used as a container. Or is it a better option to create an Entity type called 'Namespace'?
I think you should just emit a
Container
entity for the namespace. You can add a
subType
aspect on it called
Namespace
to get nice rendering. /cc @big-carpet-38439
b
ah nice, I didn't see that there was a container entity. Will check that out