aloof-window-16847
09/29/2020, 11:33 PMbumpy-keyboard-50565
09/30/2020, 12:30 AMaloof-window-16847
09/30/2020, 6:14 AMbumpy-keyboard-50565
09/30/2020, 11:05 AMacceptable-architect-70237
09/30/2020, 2:33 PMownership:{
'owner': ['owner A']}
after, can I just ingest like this
ownership:{
'owner': ['owner A', 'owner B', 'owner C']}
I thought MAE will pick up the difference?aloof-window-16847
09/30/2020, 3:27 PMbumpy-keyboard-50565
09/30/2020, 4:13 PMsilly-apple-97303
09/30/2020, 6:32 PMsilly-apple-97303
09/30/2020, 6:46 PMbumpy-keyboard-50565
09/30/2020, 7:09 PMsilly-apple-97303
09/30/2020, 7:41 PMbumpy-keyboard-50565
09/30/2020, 7:44 PMsilly-apple-97303
09/30/2020, 7:46 PMacceptable-architect-70237
09/30/2020, 8:09 PMownership
is a row of record, as shown in attached. I think the whole logic is rewriting to persist. Or I might have understood something wrong.bumpy-keyboard-50565
09/30/2020, 8:17 PMaloof-window-16847
10/01/2020, 5:19 PMUpstreamLineageDelta
has almost all the elements to let partial updates of UpstreamLineage
aspect.
There is UpstreamLineageResource.deltaUpdate()
which implements the actual update of the lineages list, REST API gets generated in com.linkedin.dataset.datasets.snapshot.json
.
Completing this use case would be a good example to base on.bumpy-keyboard-50565
10/01/2020, 5:21 PMaloof-window-16847
10/01/2020, 11:27 PMUpstreamLineage
aspect (currently only add/update is implemented) via REST API.
Something like this does the trick:
curl -s -H 'X-RestLi-Protocol-Version:2.0.0' -XPOST \
'<http://localhost:8080/datasets/($params:(),name:SampleHiveDataset,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Ahive)/upstreamLineage?action=deltaUpdate>' -d'{
"delta": {
"upstreamsToUpdate": [
{
"auditStamp": {
"actor": "urn:li:corpuser:jdoe",
"time": 1581407189000
},
"type": "VIEW",
"dataset": "urn:li:dataset:(urn:li:dataPlatform:hdfs,MyNewHdfsDataset,PROD)"
}
]
}
}' | jq
It returns updated UpstreamLineage
aspect.
I still have to trace all the elements needed to be done to implement this for some other aspect - pdl to define, resource class etc.
Can I use Kafka API to perform partial updates or it requires more work?
It would be really great if it works with Kafka too 🙂bumpy-keyboard-50565
10/02/2020, 12:29 PMUpstreamLineageDelta
to Delta
you can start emitting MCE with the delta info. There's one more step to register the mapping of UpstreamLineageDelta
to the corresponding Action
rest.li method that's currently missing in mce-consumer-job
.aloof-window-16847
10/05/2020, 6:18 PMMetadataChangeEventsProcessor.java
method consume()
handles only snapshots and it calls processProposedSnapshot()
-> BaseRemoteWriterDAO.create()
.
And class RestliRemoteWriterDAO
, the actual implementer of abstract class BaseRemoteWriterDAO
(which has only one method create()
), handles only snapshots... So if I use a similar Kafka->Rest.li dao, then I would need to add another method besides create()
to handle deltas.
Am I on the right way?bumpy-keyboard-50565
10/06/2020, 12:16 PMAction
method of your choice. See this for more details: https://linkedin.github.io/rest.li/user_guide/restli_clientaloof-window-16847
11/19/2020, 1:00 PMbumpy-keyboard-50565
11/19/2020, 6:30 PM