Hi! I see that we have a model for <DeploymentInfo...
# getting-started
p
Hi! I see that we have a model for DeploymentInfo but I can’t find the association to MetadataChangeEvent. Is there no way to define it via Kafka stream?
s
This seems like a legacy model which we don't use right now.
However, if you think that this would be a good candidate of an aspect for dataset entity, you can add that to DatasetAspect model and rebuild the repo, MetadataChangeEvent topic schema will automatically updated to include that aspect.
p
Thanks for the replies. I’m wondering what the legacy model was replaced with. I’m trying to model an SQL Table, Database and Instance (Deployment) relationship. Any recommendations on how to do this with the standard models? I was thinking something like: • Table -> Dataset • Database -> DataPlatform • Instance -> ?
I am aware that we could add a new aspect, but I was hoping to postpone having to fork DataHub for now.
@steep-airplane-62865 Any further thoughts on this? I appreciate your time; I know you must be busy.
s
Hey @plain-arm-6774, sorry for late response. Thanks for detailing your use case. Now, I understand the question better. You actually want to create separate entities for
table
,
database
and the
instance
and track the relationship between them. Short answer, DataHub doesn't have that support right now.
Copy code
Database -> DataPlatform
This is not quite correct as well.
DataPlatform
is not an entity that uniquely defines a database. You can check
DataPlatformUrn
. It's actually an entity to uniquely define a data platform type like mysql, hdfs etc. The purpose of that is different than what you thought originally.
p
Thanks for the reply! To ensure I understand your response, you are implying we should create a DatasetEntity for each of the data assets:
table
,
database
,
instance
(each with appropriate properties and relationships), correct?
Copy code
Short answer, DataHub doesn't have that support right now.
Does this mean that the above solution works but is not as ideal as adding explicit data models for each of the data asset types?
b
Actually internally we represent
<database>.<table>
as a dataset, each identified by an URN of
{platform, name, fabric}
(
fabric
is used to denote different environment, e.g.
prod
vs
staging
). For multiple deployments of a single dataset (e.g. replicas of MySQL tables), we're introducing a new entity called
DatasetInstance
. We do plan to open source that entity models at some point too.
p
Makes sense, thanks for the info!