https://datahubproject.io logo
#advice-metadata-modeling
Title
# advice-metadata-modeling
m

mammoth-bear-12532

04/30/2022, 12:29 AM
2. What is the difference between PDLs under
metadata-models
& graphqls under
datahub-graphql-core
? E.g. DataProcessInstance entity looks very promising to our needs. However, we noticed there are types only in graphql but not in PDL (DataProcessInstanceResult), types only in PDL but not in graphql (DataProcessInfo) and types in both places with slight differences (DataProcessInstanceRunResultType and RunResultType). If we need to create a new aspect, how do I configure both places? 🧵
The PDLs under
metadata-models
form the "base metadata model" for DataHub. They result in auto-generated Avro schemas, support for generic Rest-ful endpoints and auto-indexing / graph+search. We are close to merging in automatic OpenAPI spec generation from this as well.
The graphql types found in
datahub-graphql-core
form a sort of "business-logic layer" on top of this raw metadata model, to 1. Solve for modeling mistakes / regret that we might have in the base layer 2. Simplify certain common operations that might be too verbose to express using the base model.
Work in the "business-logic" layer typically is manual, and involves a lot of hand-written mappers from the graphql types to the underlying pegasus types with some "transformation logic" embedded.
We have plans to support auto-generation of GraphQL types from the metadata model (as a bootstrapping way to reduce the cost of spinning up that first graphql resolver)... but we expect that as the application needs evolve, some manual work will be done in the biz-logic layer and that will make the "business model" drift from the "data model".
We don't exactly have a silver bullet here, curious what your thoughts are on this particular subject and what sort of solution you are looking for in this area.
To answer your specific question -> To create a new aspect, you start with PDL, and then to surface it in GQL you usually create a shadow GQL type by hand and wire up the GraphQL resolver manually to have it become serve-able by the GQL API. There is a dynamic GQL thing that we did for auto-rendering aspects that requires no additional code to be written which @green-football-43791 is best positioned to explain.