Hey hey ! I'm seeing that we have these topics plu...
# all-things-deployment
d
Hey hey ! I'm seeing that we have these topics plus some others in the kafka:
Copy code
FailedMetadataChangeEvent_v4
FailedMetadataChangeProposal_v1
MetadataAuditEvent_v4
MetadataChangeEvent_v4
MetadataChangeLog_Timeseries_v1
MetadataChangeLog_Versioned_v1
MetadataChangeProposal_v1
I'm planning to define a retention policy for these topics. Does that sounds okay if I define a window that allows the records to get consumed, like 5-10 days? Are there any topics that are used as a (long term)source-of-truth data store by any component?
👀 1
i
Hello Mert, Using Kafka’s default of 7 days should be enough. We do not use kafka topics for long-term source of truth. Typically you want just enough retention to ensure components that read those topics can re-process them in case of failures. Having a failure for 7+ days is not heard of but hopefully it would be solved much much sooner.
👍🏻 1
thankyou 1
🙏 1
If you want, I would suggest connecting something like Kafka connect or spark jobs to sink the messages in those topics into something like s3 or AWS Glacier if you really want to have backups. That said, most metadata (except for run information of jobs or data quality on a given day) should be relatively easy to re-acquire from the source of said metadata.
d
That is great explanation. At the current stage I guess we'll tolerate loosing run information etc for a time, and we can definitely consider integrating with Kafka Connect in the long run for having these data at a proper place for the long run, such as s3 as you said. Thank you once again, you are the best 🚀
teamwork 1
i
Thank you for trying datahub!