I'm just started playing around with DataHub. It l...
# getting-started
b
I'm just started playing around with DataHub. It looks very powerfull and might be a good software for our data catalog needs. I have a question concerning table delets: We have some RDMBS (mainly Oracle and MS) where whole tables get created and deleted now and then. New tables and new or deleted columns are picked up by DataHub during ingestion but when a table gets deleted in the database it remains visible in DataHub. I understand, that "delete table" is a dedicated command which needs to be send to the backend, but are there any best practise or suggestions how we can detect deleted tables during scanning/ingestion and then remove them from the catalog? Maybe some kind of comparison/diff?
l
We are aware of this issue and are planning to address this. In the ingestion pipeline, we are starting to store some state already. We plan to store additional state about previous ingestion runs so that we can do the diff and mark tables which have been dropped automatically. ETA for this about a month - please stay tuned. cc @helpful-optician-78938
b
Hi @loud-island-88694 @helpful-optician-78938 , I’m wondering if there’s any update on this. We are evaluating the potential of migrating from Amundsen to DataHub. In Amundsen, there’s Staleness removal supported. It would be a pretty big concern if DataHub doesn’t support this since we have frequent deletion in our system. Thanks 🙏
l
@helpful-optician-78938 is currently working on this. We should have an update in a week
plus1 2
b
@loud-island-88694 What a great news! Looking forward to the update 😃
b
Think this will be very useful for folks 🙂