Not sure what's the right channel but... In <lates...
# getting-started
m
Not sure what's the right channel but... In

latest Community Meeting

The super awesome feature of dataset stats seems to rely on time series based datastore. It sounds like that data is not persisted in MySQL but instead is only stored in ES... In case of some issues and a need for running reindexing of ES how is that data preserved? Or would we need to fully re-ingest this data after reindexing?
b
Also @mammoth-bear-12532.. Yes so as of today you'd want to do a backfill using Kafka.. IE re-emit the profiling events (assuming you have long enough retention via Kafka)
m
Is there an easy way to do that?
Also, sounds like bumping up retention period in Kafka might be worth it
b
Yeah so you'd basically run a script we'd provide that would read old offsets and emit new profiling messages
Because simply rewinding consumers won't work - we will share the same kafka topic for both time series and snapshot metadata
(at least that's the plan today)
one other alternative we considered was storing events in MySQL as the Source of Truth (consistent with other aspects)
But decided that the storage duplication overhead may not be worth it
m
Hey @millions-engineer-56536 we're working on a more systematic approach to backup-restore, that should address this. Will share the design here when we write it up.
👍 1
q
Hi … is this available for download … would like to try out the data profiling reports
b
@quick-restaurant-75578 We will be raising the PR for this most likely tomorrow! What sources DBs are you interested in profiling?
q
Thanks @big-carpet-38439 - snowflake, sql server, postgres
@big-carpet-38439 - Hope it is done … please let me know how to download the latest build
m
@big-carpet-38439 wondering if the thoughts on systematic approach to backup-restore has been shared with the community.