Hi. Is there any way to remove the already ingeste...
# ingestion
g
Hi. Is there any way to remove the already ingested datasets in datahub? Or is it an append-only ingestion pipeline? Is there a way to mark 'Deleted' on a particular dataset?
g
DataHub primarily uses “soft deletes” - you can mark a given entity as “removed” by emitted a Status aspect with removed=True
Creating the removal MCE would be something along these lines:
Copy code
mce = MetadataChangeEventClass(
        proposedSnapshot=DatasetSnapshotClass(
            urn=urn,
            aspects=[StatusClass(removed=True)],
        )
    )
f
@powerful-jelly-19645
👀 1
🙌 1
b
as i am using file to rest api, this is how my file looks like
Copy code
[{
    "auditHeader": null,
    "proposedSnapshot": {
        "com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot": {
            "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
            "aspects": [
                {
                    "com.linkedin.pegasus2avro.common.Status": {
                        "removed": true
                    }
                }
            ]
        }
    }
}]
interestingly, putting removed=True removes it from the UI and search results, however, if the dataset is linked upstream/downstream, then it can still be accessed. should this be considered a bug?
g
Yep removed=True is essentially a soft delete, which you’d use to indicate that the dataset no longer exists or has been deleted without completely purging the history from the backend storage layer
We don’t yet support “hard deletes” but are working on it, since there are use cases for both types of deletes