Hello! I have a question about dataset updates. If...
# ingestion
c
Hello! I have a question about dataset updates. If I delete a table in a dataset, how do I get it to be deleted or not visible on the UI side? I know about “status: removed” transformer, but if i have a scheduled ingestion in Airflow, is there a way to apply every changes automaticly without manually updating status or something else? I mean like commiting a code.
g
Ah, are you asking if there is a way to programmatically delete data (rather than curling the delete endpoint manually?)
c
I mean yes @green-football-43791. While ingestion is happening with pipeline, go check my redshift, and if a table is deleted which is currently appearing in UI, delete it.
g
Does the existing delete API work for you in this use case?
Copy code
datahub delete --urn "<my urn>"
c
i think no
you need to know the urn first and run the command in your solution.
For example, I dropped a table that I ingested before from redshift, then I ingested my yaml file again, but it still shows up in the UI.
g
I see- you want to permanently delete it so it cannot be re-ingested?
c
yeah kinda
g
Got it, the status: removed is your best bet for now then
c
Let me explain the logic. I have an array like arr= [1,2,3]. I ingest the yml file and this array appears in UI, thats good. Now, i’m deleting the “3” and arr=[1,2]. Finally, i ingest the same yaml file again and the “3" is no longer appearing in UI.
We don’t have any feature like this, right?
g
Ah I see
This is not currently supported
c
I got it, thank you!
g
🙂 For sure. This use case does make sense though, and we'd like to support it in the future!
b
@curved-jordan-15657 Would this solution suffice? : When ingesting from Redshift, mark all existing tables as soft deleted first. Re-ingest from Redshift. All tables that are still in redshift are reinstated. This way, the tables that have been deleted from redshift are gone
c
Hmm, you mean schedule 2 jobs one after another, first delete all of them and then ingest the latest version. Seems like a little maintenance.
b
Not necessarily - ideally the ingestion connector itself would perform the deletes operation
c
So, can i put this status removed in my yaml file under transformers?
Or is there any example or something?
b
No no this isn't yet implemented
I'm just throwing out a possible solution to see if you think it would work for you
c
Oh i see, yes this could probably work