Hi Everyone, Upgraded Datahub from 0.8.19 to 0.8.3...
# troubleshoot
h
Hi Everyone, Upgraded Datahub from 0.8.19 to 0.8.32 version. But when we try to access Analytics tab getting the following error -
Kubectl logs for the gms pod -
Copy code
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<https://elasticsearchurl:443>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"2yv0XgYiShOra6hP8DB1DA","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
		at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
		at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
		... 21 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
	at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
	... 24 common frames omitted
e
Hmn you didn’t run into this error before?
Are you using AWS opensearch?
h
@early-lamp-41924 - Nope ! This is the first time encountering it.
We are using Elasticsearch
e
So this means that the elasticsearch setup job did not run correctly
Please follow this process: 1. stop gms from running (kill container for docker, set numReplicas to 0 for kubernetes) 2. delete datahub_usage_event index by curling elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html 3. rerun elasticsearch-setup-job (with the correct parameters - i.e. USE_AWS_ELASTICSEARCH should not be set) 4. then start gms back again
plus1 1
h
Oh , is USE_AWS_ELASTICSEARCH not supposed to be set
will try out the above options and update
e
if you aren’t using AWS opensearch, you should either set that to false or not set it!
h
@early-lamp-41924 - for numReplicas can you share the path in yaml to set it. Also are you referring to this (attached screenshot , I see replicaCount in the deployment.yaml - is this the one that you are referring to )?
e
Yes!
h
Ok will update in a bit.
@early-lamp-41924 - Getting this error when deploying the pods - Warning Evicted 3m36s kubelet The node was low on resource: ephemeral-storage. Any suggestions on this.
e
when starting elasticsearch setup job?
h
@early-lamp-41924 - yes. Killed all the pods and redeployed was still facing the same. So switched to a different cluster, and followed the steps you shared. The Application got deployed without previous ephemeral storage issue, but still getting the same error , when accessing Analytics. Sharing the values.yaml file. ( Also minor correction to above is , we are using AWS elasticsearch , though not the opensearch engines )
e
can you post the list of indices in your elasticsearch cluster?
ah
so you are using AWS elasticsearch
please follow the same process with USE_AWS_ELASTICSEARCH=true
on elasticsearch setup job
h
Yes , kept it as true.
e
remember to delete indices for datahub_usage_event
so two things
please share the list of indices in the elasticsearch cluster
and the elasticsearch-setup-job logs
h
Copy code
List of Indices -
{".opendistro-ism-managed-index-history-2022.03.23-000103":{"aliases":{}},"mlmodelgroupindex_v2_1650034138207":{"aliases":{"mlmodelgroupindex_v2":{}}},"dataset_datasetprofileaspect_v1_1650034205591":{"aliases":{"dataset_datasetprofileaspect_v1":{}}},".opendistro-ism-managed-index-history-2022.03.21-000101":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.05-000116":{"aliases":{}},"datajob_datahubingestionrunsummaryaspect_v1":{"aliases":{}},"dataplatforminstanceindex_v2":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.09-000120":{"aliases":{}},"assertion_assertionruneventaspect_v1":{"aliases":{}},"dataset_operationaspect_v1":{"aliases":{}},"dashboardindex_v2_1650034160616":{"aliases":{"dashboardindex_v2":{}}},".opendistro-ism-managed-index-history-2022.04.01-000112":{"aliases":{}},"dataplatformindex_v2_1650034166557":{"aliases":{"dataplatformindex_v2":{}}},"schemafieldindex_v2_1650034144381":{"aliases":{"schemafieldindex_v2":{}}},"datahubpolicyindex_v2_1650034126926":{"aliases":{"datahubpolicyindex_v2":{}}},"glossarytermindex_v2_1650034155043":{"aliases":{"glossarytermindex_v2":{}}},".opendistro-ism-managed-index-history-2022.03.30-000110":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.02-000113":{"aliases":{}},".opendistro-ism-managed-index-history-2022.03.24-000104":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.14-000125":{"aliases":{}},"mlmodeldeploymentindex_v2_1650034166010":{"aliases":{"mlmodeldeploymentindex_v2":{}}},".opendistro-ism-managed-index-history-2022.03.27-000107":{"aliases":{}},".opendistro-ism-managed-index-history-2022.03.22-000102":{"aliases":{}},".opendistro-ism-managed-index-history-2022.03.25-000105":{"aliases":{}},"mlprimarykeyindex_v2_1650034155317":{"aliases":{"mlprimarykeyindex_v2":{}}},".opendistro-ism-config":{"aliases":{}},"datajobindex_v2_1650034138637":{"aliases":{"datajobindex_v2":{}}},"datahubsecretindex_v2":{"aliases":{}},"domainindex_v2":{"aliases":{}},"datahubexecutionrequestindex_v2":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.06-000117":{"aliases":{}},"datajob_datahubingestioncheckpointaspect_v1":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.11-000122":{"aliases":{}},"corpuserindex_v2_1650034166883":{"aliases":{"corpuserindex_v2":{}}},"dataflowindex_v2_1650034183058":{"aliases":{"dataflowindex_v2":{}}},"dataprocessindex_v2_1650034137624":{"aliases":{"dataprocessindex_v2":{}}},".opendistro-ism-managed-index-history-2022.04.15-000126":{"aliases":{}},"datahubretentionindex_v2":{"aliases":{}},"system_metadata_service_v1_1650034199383":{"aliases":{"system_metadata_service_v1":{}}},"mlfeaturetableindex_v2_1650034137894":{"aliases":{"mlfeaturetableindex_v2":{}}},".opendistro-ism-managed-index-history-2022.03.31-000111":{"aliases":{}},"dataset_datasetusagestatisticsaspect_v1_1650034206075":{"aliases":{"dataset_datasetusagestatisticsaspect_v1":{}}},"datasetindex_v2_1650034188843":{"aliases":{"datasetindex_v2":{}}},"containerindex_v2":{"aliases":{}},".opendistro-ism-managed-index-history-2022.03.28-000108":{"aliases":{}},".opendistro-ism-managed-index-history-2022.03.26-000106":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.18-000129":{"aliases":{".opendistro-ism-managed-index-history-write":{}}},".opendistro-ism-managed-index-history-2022.04.03-000114":{"aliases":{}},"notebookindex_v2":{"aliases":{}},"glossarynodeindex_v2_1650034177479":{"aliases":{"glossarynodeindex_v2":{}}},"datahubingestionsourceindex_v2":{"aliases":{}},"corpgroupindex_v2_1650034132332":{"aliases":{"corpgroupindex_v2":{}}},"datahub_usage_event":{"aliases":{}},"graph_service_v1":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.04-000115":{"aliases":{}},"chartindex_v2_1650034194111":{"aliases":{"chartindex_v2":{}}},".opendistro-ism-managed-index-history-2022.04.10-000121":{"aliases":{}},".opendistro-ism-managed-index-history-2022.03.29-000109":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.12-000123":{"aliases":{}},".opendistro-ism-managed-index-history-2022.03.20-000100":{"aliases":{}},"mlfeatureindex_v2_1650034177781":{"aliases":{"mlfeatureindex_v2":{}}},"tagindex_v2_1650034149765":{"aliases":{"tagindex_v2":{}}},"mlmodelindex_v2_1650034172207":{"aliases":{"mlmodelindex_v2":{}}},".tasks":{"aliases":{}},".kibana":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.13-000124":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.17-000128":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.16-000127":{"aliases":{}},"assertionindex_v2":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.08-000119":{"aliases":{}},".opendistro-ism-managed-index-history-2022.04.07-000118":{"aliases":{}}}%
Logs for Elasticsearch, unable to get it - shows - unable to retrieve container logs for docker:
used this to delete index - DELETE /datahub_usage_event
e
logs for elasticsearch-setup-job?
Did it actually run?
h
@early-lamp-41924 - It did run, but when I try to get the logs , that is the error I get.
e
You mean it says Completed right? Without logs, it is hard to say the commands inside actually ran
Are you building the container yourself?
h
Except for the frontend, everything is from the prebuilt images Yes the Elasticsearch-setup job completed.
Tried to repeat the process - 1. Killed all pods 2. Deployed prerequisites 3. Deleted Index using DELETE /datahub_usage_event 4. Deployed Datahub. (attached the elasticsearch setup job logs )
e
do you see index that looks like datahub_usage_event_00000
Sry could you also DELETE /_template/datahub_usage_event_index_template
plus1 1
h
No I dont see such an index. Also now all the pods are getting evicted ( after multiple deployments ) anything i should have taken care of ?
Ok let me delete that as well
@early-lamp-41924 - Could this be the reason for the pods getting evicted. https://datahubspace.slack.com/archives/C029A3M079U/p1645175959985509 Currently stuck with the cluster not responding.
e
Does the cluster have enough resources?
h
This is something we had been using.
e
Hmn this is something hard for us to help with as it is likely not related to DataHub itself. Can you check with your cloud infra team to see why they are getting evicted?
h
Sure, I am already in a conversation with them.
But is there a recommended set of configurations to use for kubernetes deployment.
e
Seems like this cluster is used for other purposes as well? Or is DataHub running 58 pods??
h
No this cluster is not used for anything else except Datahub.
e
can you post
kubectl get pods -n <<namespace>>
3 of the above nodes should be enough to support DataHub
👍 1
h
@early-lamp-41924 - Edited the cluster to have sufficient nodes and then redeployed the application. Everything looks good. Only this shows up .Need to fix this 🙂
e
Oh this only shows up when there is no metadata on the platform
Once you ingest, it should not show up
h
I see !
Let me ingest.
@early-lamp-41924 Was able to ingest the Data. But when I try to view the Dataset i ingested I am getting - Unauthorized exception.
e
Can you check the list of policies?
h
I dont see any policies 😞
e
There should be one that enables 'view entity page' for all users
Can you login with the admin account?
h
Even under admin account, I see only the settings page
e
Hmn the datahub account?
That should never be the case. We always have one account with admin privs
h
@early-lamp-41924 Looks like I messed up the DB. Fixed it.
@early-lamp-41924 - Is GraphQL visible by default ? Would like to disable it for certain users.
e
You mean graphiql?
h
Yes GraphiQL
Copy code
- 15:41:18.241 [Thread-6133] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: Failed to update urn:li:tag:DataType=PatientInsuranceData on urn:li:dataset:(urn:li:dataPlatform:glue,da-intelligentmn_qa.270_qa_source_intelligentmn,PROD). urn:li:tag:DataType=PatientInsuranceData does not exist.
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Failed to update urn:li:tag:DataType=PatientInsuranceData on urn:li:dataset:(urn:li:dataPlatform:glue,da-intelligentmn_qa.270_qa_source_intelligentmn,PROD). urn:li:tag:DataType=PatientInsuranceData does not exist.
	at com.linkedin.datahub.graphql.resolvers.mutate.util.LabelUtils.validateInput(LabelUtils.java:287)
	at com.linkedin.datahub.graphql.resolvers.mutate.AddTagResolver.lambda$get$0(AddTagResolver.java:36)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	... 1 common frames omitted
@early-lamp-41924 - Getting the above error , when trying to add Tags
e
Is this from the ui? or are you using the api?
h
From UI
Trying to associate existing Tags to the datasets. Able to add new tags
I dont see these tags in the database though.