Hello, I'm trying to delete some metadata, but the...
# troubleshoot
m
Hello, I'm trying to delete some metadata, but the
datahub delete
output is ambiguous, and the data is not gone:
Copy code
❯ datahub delete --entity_type dataset --platform kafka --hard 
This will permanently delete data from DataHub. Do you want to continue? [y/N]: y
[2022-04-11 10:17:22,059] INFO     {datahub.cli.delete_cli:200} - datahub configured with <http://localhost:8080>
[2022-04-11 10:17:22,182] INFO     {datahub.cli.delete_cli:212} - Filter matched 22 entities. Sample: ['urn:li:dataset:(urn:li:dataPlatform:kafka, 
... (22 urns) 
]
This will delete 22 entities. Are you sure? [y/N]: y
100% (22 of 22) |################################################################################################################################################################################################################| Elapsed Time: 0:00:01 Time:  0:00:01
Took 6.673 seconds to hard delete 0 rows for 22 entities
the gms debug log shows 22 successful delete actions, but the output of the command says
0 rows
The data is not deleted. What can I do to a. troubleshoot this further b. actually delete the data ?
Maybe of interest: the 22 URNs where ingested in a single ingestion run. When I try the
datahub ingest rollback
for that run, I see a similar effect:
Copy code
This rollback  deleted 0 entities and rolled back 0 aspects
showing first 100 of 0 aspects  reverted by this run
followed by a list of 100 aspects.
e
For debugging, could you try removing --hard and see if it successfully soft removes these entities?
m
soft delete output:
Copy code
Took 7.678 seconds to soft delete -1 rows for 22 entities
Notice the
-1
The main difference now is that they no longer appear in the UI, and in the graphql api, I can see that
status.removed=true
Of course, now I can not hard delete them anymore using
--platform kafka
Hard deleting them one by one seems to work though.