A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Hey, how do you people deal with datasets deleted in source after they’ve already been ingested with a previous run? I am trying to figure out how to automate the process - I was thinking of maybe running some job that would compare what is already ingested to what I would be ingesting and sending mce’s for the diff items with a status aspect where `removed=true` . Curious to know what if anyone had success with this or any other approach.

Maybe this is helpful: <https://github.com/linkedin/datahub/blob/master/docs/features.md>
Schema history seems to be one of the features "coming soon". This might cover your use case as well.

Since we're now maintaining more state about what was ingested in previous runs, we're investigating how to leverage that state to automatically soft-delete (i.e set removed=true) for datasets that are absent in subsequent runs cc <@U028L1V9BE1> <@UV0M2EB8Q>. Stay tuned for more updates

Schema history is a slightly different feature used more for tracking changes at the schema level (addition/removal of fields) over time