Hi team, Is there any way to automatically delete ...
# ingestion
a
Hi team, Is there any way to automatically delete the outdated dataset ? Ex. updated database should reflex in dataset
a
@better-orange-49102 Thanks!
@better-orange-49102 I tried to add stateful ingestion but it did not delete the old dataset. Would you recommend what I could do?
plus1 1
b
i believe it only works if you activate it from the onset, but i don't use it myself. @incalculable-ocean-74010 can probably advise better
i
xL is right, stateful ingestion is based on metadata information from previous runs that were run with stateful ingestion.
By this I mean that once stateful ingestion is enabled, DataHub will record what was ingested in a run vs a previous run and soft-delete the differences (metadata that was seen before that was not seen now). From the docs: “Stateful ingestion can be used to automatically soft-delete the tables and views that are seen in a previous run but absent in the current run (they are either deleted or no longer desired).” https://datahubproject.io/docs/metadata-ingestion/docs/dev_guides/stateful/#use-cases-powered-by-stateful-ingestion
b
Just curious, does stateful still work if you have multiple recipes for a datasource, each recipe having it's specific allow/denies table names?
i
I think so. Stateful ingestion works at a recipe level. @big-carpet-38439 can confirm.
r
Soft delete also does not work. Is there a way to view datasets that were marked deleted after a ingestion.
i
You can check that using the datahub-cli by going over the datasets you want and verifying if the status aspect exists for those datasets with
removed=true
Something like:
datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:bigquery,bigquery-public-data.<http://covid19_geotab_mobility_impact.us|covid19_geotab_mobility_impact.us>_border_wait_times,PROD)" -a status
r
okay thanks I will try
a
Thanks everyone simple smile