Hi to all, would it be possible to add an optional...
# feature-requests
s
Hi to all, would it be possible to add an optional expiration date for (everything?). This is my usecase: I have several Airflow pipelines, owned by different users. If a user change something (dag/task/lineage informations) they should remember to deprecate the old stuff. Instead, adding an expiration time, if the user forgets to deprecate that object this one will automatically expire, leaving a cleaner catalog.
plus1 2
s
Interesting request. Just today one of the devs in my team mentioned that we need to remove dag/task/lineage for things that are removed
👍 1
b
I wonder if we can address this with time-based retention of aspects @early-lamp-41924
As opposed to configuring expiration on a per entity basis we’d have a blanket policy based on entity / aspect type
e
To me, this is a bit more complicated bc it requires deletion of entities not just old versions. For instance, do we need to go and clean up graph edges as well? (or reference to this ecpired entity)
l
Would it cause confusion if the asset is removed from DataHub but is still live/executing in Airflow?
e
So @stale-jewelry-2440 would you want to expire based on last updated on any of the aspects?
s
yes, I think it should work like that. So if a database table expires, all that refers to it should be updated. example in the example: unless there is a Tasks (or else) that keeps declaring it’s still using it in its lineage
e
Got it. Hard deleting expired entities seems dangerous as it may drift the catalog from what’s actually there in the data ecosystem. WDYT about applying soft deletes (or something like an expired tag), where these expired datasets show up on the bottom of the search list and are marked expired?
s
yes, this conservative approach sounds also good. Maybe with a manual operation to actually ‘delete’ all the expired stuff
b
++
@stale-jewelry-2440 Would the ability to hard -or- soft delete an entity manually via the UI be of interest?
s
not sure, at least in this context, because if you drop a pipeline you have probably to delete many objects, which is not optimal to do by hand. And more: maybe the user that cancelled the pipeline does not really know whether the tables/objects associated will be of further interest for other scopes/other teams. Instead, with the expiration tool, the pipeline will just stop announcing itself and its lineage. After the deadline, if no other processes will announce the tables/objects used, they will expire too with the pipeline. To me this seems more efficient and safe
l
hi folks, I’ve added this to our new feature request portal - please head over to upvote/subscribe to updates as we progress https://feature-requests.datahubproject.io/b/Developer-Experience/p/add-an-optional-expiration-date-to-entities