Hi folks, we're trying to upgrade our metadata ser...
# troubleshoot
b
Hi folks, we're trying to upgrade our metadata service from 0.8.35.x -> 0.8.41. We are running the metadata service on kubernetes (there's around 10 replicas as we've not broken out the MCE and MAE consumers). When we rolled out the 0.8.41 upgrade, we have noticed the canary instance trying to bootstrap on 0.8.41 seems to be getting stuck in the
IngestPoliciesStep
. I think it seems to be related to @early-lamp-41924’s PR - https://github.com/datahub-project/datahub/pull/4733 but for some reason we're still seeing the bootstrap fail (from what I understand that should allow us to be able to bootstrap). Some details in: 🧵
Some relevant log details:
Copy code
10:55:13 [main] INFO  c.l.metadata.boot.BootstrapManager - Executing bootstrap step 2/8 with name IngestPoliciesStep...
10:55:13 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep - Ingesting default access policies...
10:55:13 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep - Ingesting default policy with urn urn:li:dataHubPolicy:0
10:55:13 [pool-11-thread-1] ERROR c.l.d.g.a.service.AnalyticsService - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]
10:55:13 [pool-11-thread-1] ERROR o.s.s.s.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task
java.lang.RuntimeException: Search query failed:
	at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
	at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:236)
	at com.linkedin.gms.factory.telemetry.DailyReport.dailyReport(DailyReport.java:76)
...
Copy code
10:55:13 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep - Ingesting default policy with urn urn:li:dataHubPolicy:1
10:55:13 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:7
10:55:13 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:view-entity-page-all
10:55:13 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:view-dataset-sensitive
10:55:14 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep - Pushing documents to the policy index
I'm wondering if there's something we can do to unblock ourselves here. If I understand correctly, one option is for us to bring down all existing GMS instances and try to deploy a new version only. Is that correct? Any other ideas / suggestions?
Also if it helps, I can pull down the debug logs as well
b
Chatting with you offline!