A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Hi folks, have we considered log compaction of `MetadataChangeLog` events and replaying log-compacted `MetadataChangeLog` events as a mechanism to rebuild/recover search indexes particularly for time-series metadata that we don't persist in Datahub MySQL table?

Hey Jyoti, this would work for versioned aspects (if the key is &lt;entity-urn,aspect&gt;, but for time-series aspects, it might be more natural to set retention in Kafka for that topic for whatever your preferred backup window is (e.g. 90 days)  and then as you suggested just rebuild the indexes if there is a failure. In a more complex environment where kafka -&gt; s3 etl facilities exist, it would be more natural to just offload the timeseries topic to the lake, and rebuild from the lake if needed to recover.

This will allow for much longer retention at lower cost.. e.g. you can easily keep 1yr of logs.