We’re thinking about the level of persistence need...
# getting-started
h
We’re thinking about the level of persistence needed especially form Neo4J, as we might end up hosting it ourselves. In case we need to regenerate the data in Neo4J, whats the best approach? I guess we can keep a full history in the Kafka-topic, and reset the cursor manually, but it feels suboptimal from a scaling perspective? Is there a way to tell GMS to resend the MAE messages based on whats in MySQL? Backups are of course nice, but we see a risk of the ES and Neo4J backups getting out of sync in case of a disaster. Therefore, it would be nice to have a way to repopulate the DBs as a fallback. Or maybe we’re just overthinking this 😅
b
There is a per-URN backfill API available (e.g. https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/dataset/Datasets.java#L242). That said, we're adding the ability to mass backfill so the API is subjected to change in the near future and that's why it's not well documented now. @steep-airplane-62865 can shed more light here.
👍 1
h
Wrong thread? ☝️
Thanks @bumpy-keyboard-50565 I like the sound of that backfill possibility!
1
b
Sorry my bad. Early morning 😛
m
@high-hospital-85984: We ETL the Metadata topics to data lake... so that is always there as a way to "backfill"