Hi folks, have we considered log compaction of `Me...
# ingestion
m
Hi folks, have we considered log compaction of
MetadataChangeLog
events and replaying log-compacted
MetadataChangeLog
events as a mechanism to rebuild/recover search indexes particularly for time-series metadata that we don't persist in Datahub MySQL table?
m
Hey Jyoti, this would work for versioned aspects (if the key is <entity-urn,aspect>, but for time-series aspects, it might be more natural to set retention in Kafka for that topic for whatever your preferred backup window is (e.g. 90 days) and then as you suggested just rebuild the indexes if there is a failure. In a more complex environment where kafka -> s3 etl facilities exist, it would be more natural to just offload the timeseries topic to the lake, and rebuild from the lake if needed to recover.
This will allow for much longer retention at lower cost.. e.g. you can easily keep 1yr of logs.
m
Thanks!