Hi folks, greetings! We are trying to explore opt...
# advice-metadata-modeling
n
Hi folks, greetings! We are trying to explore options to surface dataset timeliness information to DataHub and have a few questions around metadata modeling. 1. What is the general recommendation going down this path? E.g. should we extend existing entities/aspects, creating new ones or squeezing data into CustomProperties? (We do need a separate UI with custom rendering). 2. What is the difference between PDLs under
metadata-models
& graphqls under
datahub-graphql-core
? E.g. DataProcessInstance entity looks very promising to our needs. However, we noticed there are types only in graphql but not in PDL (DataProcessInstanceResult), types only in PDL but not in graphql (DataProcessInfo) and types in both places with slight differences (DataProcessInstanceRunResultType and RunResultType). If we need to create a new aspect, how do I configure both places? 3. Any prior art on Airflow integrations, especially on what hooks are available? Ideally, we’d love Airflow to push events on task scheduled, task started and task completion to correctly calculate timeliness WRT SLA and alert our users. Is that something has been done successfully in the past? Meanwhile, if Airflow is only able to push event on task completion, and DataProcessInstanceRunEvent only have one TimeseriesAspectBase, how do I persist several timestamps like
execution_time
,
start_time
, and
finish_time
? Should I create another similar aspect, or extend the existing ones? CC @mammoth-bear-12532 @dazzling-judge-80093 @bitter-lizard-32293