Hey, how do you people deal with datasets deleted ...
# ingestion
b
Hey, how do you people deal with datasets deleted in source after they’ve already been ingested with a previous run? I am trying to figure out how to automate the process - I was thinking of maybe running some job that would compare what is already ingested to what I would be ingesting and sending mce’s for the diff items with a status aspect where
removed=true
. Curious to know what if anyone had success with this or any other approach.
s
Maybe this is helpful: https://github.com/linkedin/datahub/blob/master/docs/features.md Schema history seems to be one of the features "coming soon". This might cover your use case as well.
l
Since we're now maintaining more state about what was ingested in previous runs, we're investigating how to leverage that state to automatically soft-delete (i.e set removed=true) for datasets that are absent in subsequent runs cc @helpful-optician-78938 @mammoth-bear-12532. Stay tuned for more updates
👀 2
Schema history is a slightly different feature used more for tracking changes at the schema level (addition/removal of fields) over time