I was playing around with lineage. Now I ended up ...
# troubleshoot
s
I was playing around with lineage. Now I ended up with this graph. I would like to remove the dataset to dataset lineage. I am not sure how to do that. UpstreamLineageClass does not seem to have a removed status. https://github.com/linkedin/datahub/blob/352a0abf8d7e4dd5d5664a8c7cdf3d77bf6f1c51/metadata-ingestion/src/datahub/metadata/schema_classes.py#L3274
I tried doing
Copy code
lineage_mce = MetadataChangeEventClass(
        proposedSnapshot=DatasetSnapshotClass(
            urn=_get_dataset_urn(downstream_urns),
            aspects=[
                UpstreamLineageClass(
                    upstreams=None,
                ),
            ],
        )
    )
    emitter = DatahubRestEmitter("<http://datahub-datahub-gms.apps.svc.cluster.local:8080>")
    emitter.emit_mce(lineage_mce)
But that did not work
b
@green-football-43791 What's the easiest way to remove dataset -> dataset lineage?
s
I am also stuck with this because I accidentally sent one event incorrectly. I tried sending remove event but it did not get removed
Copy code
MetadataChangeEventClass({'auditHeader': None, 'proposedSnapshot': DataJobSnapshotClass({'urn': 'urn:li:dataJob:(urn:li:dataFlow:(airflow,replace_cta,prod),cta_global_okr_pct_change_id_lineage)', 'aspects': [DataJobInfoClass({'customProperties': {}, 'externalUrl': '<https://DOMAIN_NAME/admin/taskinstance/?flt0_dag_id_equals=replace_cta&flt3_task_id_equals=cta_global_okr_pct_change_id_lineage>', 'name': 'cta_global_okr_pct_change_id_lineage', 'description': None, 'type': 'COMMAND', 'flowUrn': None, 'status': None}), DataJobInputOutputClass({'inputDatasets': ['urn:li:dataset:(urn:li:dataPlatform:athena,analytics.cta_global_okr,PROD)'], 'outputDatasets': ['urn:li:dataset:(urn:li:dataPlatform:athena,analytics.cta_global_okr_mom_pct_change,PROD)'], 'inputDatajobs': []}), StatusClass({'removed': True})]}), 'proposedDelta': None})
We really need a delete button on the lineage tab to mark something as removed. Sending events does not work
s
+ 1 for this feature .
s
I checked
datahub_datajobindex_v2/_search?pretty=true&size=100
and
"removed" : true
is there for the airflow task that I had marked as removed. Searching in
datahub_graph_service_v1
elastic search index it seems it was not removed. That should explain why this is still showing up in the graph as well as in the "Task" tab of the Pipeline.
b
@green-football-43791 @early-lamp-41924 incorrect deletes behavior ^ and also good feature requests
g
Thanks for raising this Aseem- I created an issue to track the problem: https://github.com/linkedin/datahub/issues/3028
we'll get to this ASAP
d
I also stumbled across this problem today, is there a way to remove a single lineage graph? I ingested lineage through an emitter and now I want to get rid of 1 connection. I actually deleted the target dataset but still it is shown in the lineage graph (I guess due to the aspect value in the dataset). Anyway I am not able to delete the aspect at the downstream dataset, could you please help?
Update: Made it work through a hard delete of the upstream lineage Dataset. Soft delete did not cut it for the UI but with the hard delete it now works 🙂 Anyway feels like a bug that lineage remains after soft deleting the target, what do you think?
b
This does feel like a bug... Soft deleted entities should be filtered out of the result set, but I think are not ... @green-football-43791 This may come up in the work you and dexter are doing right now
e
I do remember we discussed this issue before. We should filter these nodes out or at least have a UI indication that the node was deleted