Hello everyone I m having problemns with my 0 10 0 deploymen DataHub #troubleshoot

Hello everyone, I'm having problemns with my 0.10....

gentle-camera-33498

03/08/2023, 12:47 PM

Hello everyone, I'm having problemns with my 0.10.0 deployment. Context: Before updating the version, I decided to soft delete datasets, charts and dashboards. With this, I could delete all entities and force reingestion to ingest new ones. Problem: I'm receiving a log of exception messages of the BulkListener like the below:

Copy code

[I/O dispatcher 2] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 5 Took time ms: -1 Message: failure in bulk execution:
[1]: index [datasetindex_v2_1678278613797], type [_doc], id [urn...], message [[datasetindex_v2_1678278613797/YrBRraPeT6OLr7JvUNdy6A][[datasetindex_v2_1678278613797][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn...]: document missing]]]

I'm unsure if it is the cause, but I do not see any dataset in the UI. NOTE: Yes, I tried to run the restoreIndices job, but nothing changed.

✅ 2

gentle-camera-33498

03/08/2023, 4:59 PM

I fully reset my deployment (uninstalled the Chart and delete storages), but the restoreIndices job takes so long. There have been more than 4 hours since the job was completed, and just 10k entities have loaded. I'm using two replicas of GMS, three of Elasticsearch, and two of kafka.

gentle-camera-33498

03/08/2023, 7:10 PM

weird. Entities can be searched and found, but metadata count and entity listing is in error or says no metadata exists. For example:

gentle-camera-33498

03/08/2023, 7:11 PM

I can find this glossary

gentle-camera-33498

03/08/2023, 7:11 PM

But:

gentle-camera-33498

03/08/2023, 7:18 PM

image.png

gentle-camera-33498

03/08/2023, 7:20 PM

image.png

astonishing-answer-96712

03/08/2023, 8:39 PM

Hey @gentle-camera-33498, this is likely due to some search issues on head that might be messing stuff up- @brainy-tent-14503 may be able to help here

gentle-camera-33498

03/08/2023, 9:21 PM

I tried a lot of things to resolve this, but without success. I'm having problems with my glossary and dataset indices. I can find them in search, but the count and list entities appear to only get data from elasticsearch. In my elasticsearch indexes, no documents were registered for these indices and GMS has a lot of messages of the error that I presented above.

brainy-tent-14503

03/08/2023, 9:21 PM

I’ve noticed that restore indices doesn’t work if they are set to soft deleted in sql. First set the status back to undelete for the urns and then restore indices.

gentle-camera-33498

03/08/2023, 9:22 PM

But I have datasets where remove=false

gentle-camera-33498

03/08/2023, 9:23 PM

But in my homepage:

gentle-camera-33498

03/08/2023, 9:24 PM

My indices

brainy-tent-14503

03/08/2023, 9:25 PM

you have 144 datasets in elastic. the restore indices should work for the missing ones. the count shown is being divided by 1k?

brainy-tent-14503

03/08/2023, 9:25 PM

the count in your sql output

gentle-camera-33498

03/08/2023, 9:27 PM

I tried to force the ingestion but stills not working. I'll have a check if BigQuery ingestion foces Status aspect to removed=false

brainy-tent-14503

03/08/2023, 9:28 PM

Also check if there is some lag on your kafka topic, are you running standalone consumers?

brainy-tent-14503

03/08/2023, 9:28 PM

async ingestion?

gentle-camera-33498

03/08/2023, 9:28 PM

No, i'm running sync ingestion without standalone consumers

gentle-camera-33498

03/08/2023, 9:29 PM

The restore indices job do not restore my glossary metadata too

brainy-tent-14503

03/08/2023, 9:29 PM

Yeah, I’ve only used it for datasets and platforms. I am not sure what kind of support is might have for terms.

gentle-camera-33498

03/08/2023, 9:31 PM

But, if i do a hardcoded search by URL I can find them: URL: https//<host>/glossaryTerm/urnliglossaryTermc7c57cda-696f-42b2-af3b-73b924e05db6/Documentation?is_lineage_mode=false Returns:

gentle-camera-33498

03/08/2023, 9:33 PM

Using the restore indices URL it's not working too:

gentle-camera-33498

03/08/2023, 9:35 PM

There is any documentation about how can I check the Kafka consumers? My Prometheus instance is off, so now I cant extract metrics from my deployment.

brainy-tent-14503

03/08/2023, 9:37 PM

kafka would be via the cli or if using confluent they have a ui, at least for their managed solution

brainy-tent-14503

03/08/2023, 9:37 PM

are these terms in a glossary node?

brainy-tent-14503

03/08/2023, 9:37 PM

i see both a glossary node index and a glossary term index. i am wondering if a restore on the node first, then the term would work

gentle-camera-33498

03/08/2023, 9:40 PM

Ok, let me try

gentle-camera-33498

03/08/2023, 9:47 PM

So, I think it didn't work

gentle-camera-33498

03/08/2023, 9:49 PM

I turned on my ingestion pipeline. The GMS logs:

Copy code

21:48:15.965 [I/O dispatcher 2] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 6 Took time ms: -1
21:48:15.966 [pool-10-thread-2] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 1008ms
21:48:15.970 [I/O dispatcher 2] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 2 Took time ms: -1 Message: failure in bulk execution:
[0]: index [datasetindex_v2_1678308848188], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Abigquery%2Cbi-data-science.bing_ads.ad_performance_report_daily_scd%2CPROD%29], message [[datasetindex_v2_1678308848188/WXDHhwssRXy3_bPyPQ14OQ][[datasetindex_v2_1678308848188][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Abigquery%2Cbi-data-science.bing_ads.ad_performance_report_daily_scd%2CPROD%29]: document missing]]]

gentle-camera-33498

03/08/2023, 9:53 PM

My dataset index:

gentle-camera-33498

03/08/2023, 9:55 PM

snippetjson.json

gentle-camera-33498

03/08/2023, 9:55 PM

The index config:

gentle-camera-33498

03/08/2023, 10:00 PM

I found this error in the stackoverflow: https://stackoverflow.com/questions/64352655/document-missing-exception-while-performing-elasticsearch-update

gentle-camera-33498

03/08/2023, 10:02 PM

The chart index has the same routing config but appears to not have erros when update the indices.

gentle-camera-33498

03/08/2023, 10:08 PM

The chart and dataset indexes have the same settings (except uuid, provided_name, and creation date)

gentle-camera-33498

03/08/2023, 10:09 PM

@brainy-tent-14503 Could you tell me what else should I check in my deployment?

gentle-camera-33498

03/08/2023, 10:11 PM

hmm...

gentle-camera-33498

03/08/2023, 10:13 PM

I'm running out of ideas of what to try... 😓

brainy-tent-14503

03/09/2023, 1:19 AM

I will need to try to reproduce this locally and try dropping a breakpoint on upsert request that is returning the document not found error. Typically those are seen on non-critical operations, however this one seems different, like the restore indices is not generating the right sequence or upserts. Or alternatively the mae consumer is not generating the right upsert.

gentle-camera-33498

03/09/2023, 1:49 PM

Sorry for taking your time @astonishing-answer-96712 and @brainy-tent-14503! Before going to bed I forced the restoreIndices job again and magically the platform resolve itself in the dawn. I'm not sure why, but I'll spend all day today investigating and seeing if I can replicate it.

gentle-camera-33498

03/09/2023, 1:50 PM

What I'm guessing is that the first time I tried to run the restore job it was using v0.9.6.1 while the front and GMS are v0.10.0.

aloof-gpu-11378

03/09/2023, 5:19 PM

Can we try setting this and then restore

Copy code

- name: GRAPH_SERVICE_DIFF_MODE_ENABLED
              value: 'false'

gentle-camera-33498

03/09/2023, 5:23 PM

Off course! I will try. I expect that with some of these changes, the restore indices job will take less time to conclude (Before upgrading to 0.10.0, restoring all indices took less than 30 minutes to conclude. Now it is very slow.)

gentle-camera-33498

03/09/2023, 5:24 PM

For now, it's working. I'll come back here when I made the change. Ok? I have to take a look at the drift between my chart and the community to search for possible causes. Thak you @brainy-tent-14503!

aloof-gpu-11378

03/09/2023, 6:00 PM

Let me see if this exposed in the community chart actually…it is not exposed. Need to add this to the community chart. I’d throw it on GMS, MCE, MAE if they are split, otherwise just GMS

wonderful-wall-76801

04/27/2023, 7:59 AM

Hi everyone! I have a similar issue with

document missing

after restoring indices (migration from 0.9.6 to 0.10.1) didn't quite understand what needs to be done to solve this problem? I need to change this parameter to

enableGraphDiffMode: false

into the Values.global.datahub.enableGraphDiffMode step?

wonderful-wall-76801

04/27/2023, 8:11 AM

after changing with parameter and restarting restore indices job, I faced with similar issue

wonderful-wall-76801

04/27/2023, 8:23 AM

Also my GMS service is actively spamming with the next logs:

aloof-gpu-11378

04/28/2023, 5:19 PM

Check your settings to enable always emit is true and enable diff to false. Then run the restore index process. Then set to

alwaysEmitChangeLog: false

and

enableGraphDiffMode: true

to reduce processing everything.

Open in Slack

Previous Next