Hello everyone I have another modeling question I have notic DataHub #getting-started

Hello everyone, I have another modeling question. ...

mammoth-whale-58647

04/28/2020, 6:13 PM

Hello everyone, I have another modeling question. I have noticed that for most "standard" resource operations, a complex key (e.g. DatasetKey) is used whereas for the actions (e.g. ingest, backfill), the URN instead seems to be the preferred identifier. Is there a particular reason for this? The reason I ask is because I would like to model "sub-entities" where both root and sub entity can have aspects. I decided against associations because really, the sub resource cannot exist without the parent and I didn't want to embed the sub resource because it is often accessed and modified independant of the parent resource. Sub resources generally work quite well in Rest.li , and they do here for the "standard" resource operations, where I can access/combine parent and child keys as I see fit. It is slightly different for the actions though. To give you some context, my parent resource has a composite key of 3 fields, my sub entity adds a fourth field to that key. Just to access the actions on the subresource, I already need to provide those 3 fields, e.g. /parent/field1,field2,field3/child?action=myaction. But then, I would have to provide the full URN to the subresource which repeats those 3 fields and adds a fourth. It seems so redundant and so now it had me wondering why the actions ingest and backfill aren't just on the Entity resource level, leveraging the same key documents that GET functions would. Before making such modifications, I wanted to better grasp why sometimes URN's are preferred, and sometimes the document keys

mammoth-whale-58647

04/28/2020, 6:18 PM

So to clarify: GET works fine /parent/field1,field2,field3/child/field4 INGEST looks weird due to the redundancy /parent/field1,field2,field3/child?action=ingest,actionparam= urnlichild:(field1,field2,field3,field4)

bumpy-keyboard-50565

04/28/2020, 9:12 PM

A small correction here, INGEST method actually takes an entire snapshot (which includes an urn in it), instead of taking the urn directly, e.g. https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/dataset/Datasets.java#L220

bumpy-keyboard-50565

04/28/2020, 9:14 PM

That said, the redundancy between urn & resource key is indeed non-ideal. We're working with the rest.li team to eventually remove this so that the resource will take URN directly, but for now we have to live with this conversion

mammoth-whale-58647

04/29/2020, 10:54 AM

Yes, I made a mistake with regards to ingest. I was more talking about the getSnapshot and backfill, though arguably, you could replace the urn field in the snapshot with a key field instead. That being said, I understand now that the URN approach is the desired identification mechanism, which leads me to rethink my approach to sub entities. Providing them with a URN that only makes sence within the scope of a parent entity seems counter intuitive to the notion of a URN, and including the parent URN in its own URN structure defeats the purpose of it being a sub entity.

3 Views

Open in Slack

Previous Next