So far every version I ve upgraded to has downtime with a re DataHub #all-things-deployment

So far every version I've upgraded to has downtime...

refined-energy-76018

06/21/2023, 2:31 AM

So far every version I've upgraded to has downtime with a reindexing in the

datahub-system-update-job

. That is

v0.9.3

v0.10.0

v0.10.1

v0.10.3

. Is this expected? https://datahubproject.io/docs/how/updating-datahub/ This page says only

v0.10.0

should have caused downtime. Is this issue related to the now-fixed retention bug in the

DataHubUpgradeHistory_v1

topic? What is confusing me is that when I made changes but kept the Datahub version the same, it wouldn't trigger a reindex even if it was past the previous

DataHubUpgradeHistory_v1

default retention period of 7 days.

delightful-ram-75848

07/11/2023, 1:58 AM

@orange-night-91387 might be able to speak to this!

brainy-tent-14503

07/11/2023, 11:10 PM

A reindex occurs anytime when a difference between the expected and actual index mappings/settings is detected. The logs for the system-update will log when differences are encountered. Whether there is a differences depends on environment variables, helm chart values, and the underlying code itself. The length of time will depend on which indices have differences, how big they are, as well as the resources available to Elasticsearch. The message in the kafka topic controls the start up of components (like GMS, mae, mce) and doesn’t control whether a reindex is needed, this is entirely based on the expected vs actual state of the indices. A message deleted in the DataHubUpgradeHistory_v1 topic would be recreated whenever a helm deployment happens, even on a re-install of the same release and is produced on any successful run of the

system-update

process (which may or may not reindex).

Open in Slack

Previous Next