Hi folks Has anyone experienced with the front end port forw DataHub #troubleshoot

Hi folks, Has anyone experienced with the front-e...

famous-florist-7218

08/03/2022, 3:31 AM

Hi folks, Has anyone experienced with the front-end port forwarding? It frequently drops connection.

Copy code

E0803 10:23:11.850679   89160 portforward.go:406] an error occurred forwarding 9002 -> 9002: error forwarding port 9002 to pod 39e9085d08f2eee680eec4bb5665835613d931532e5ca0688443fa60a75a9d7f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-9494401e-109c-0e8f-c535-11ae11edfdce": read tcp4 127.0.0.1:56872->127.0.0.1:9002: read: connection reset by peer
E0803 10:23:11.852843   89160 portforward.go:234] lost connection to pod

better-orange-49102

08/03/2022, 3:38 AM

isn't this specific to whichever infra you're using to host the containers?

famous-florist-7218

08/03/2022, 3:39 AM

FYI, I deploy DataHub on EKS.

famous-florist-7218

08/03/2022, 4:38 AM

Step to reproduce: After deploying, expose the frontend by kubectl port forwarding. Open DataHubUI -> wait for minutes… Then it will be lost connection to pod.

datahub-datahub-frontend

log:

Copy code

Forwarding from 127.0.0.1:9002 -> 9002
Forwarding from [::1]:9002 -> 9002
Handling connection for 9002
Handling connection for 9002
Handling connection for 9002
Handling connection for 9002
Handling connection for 9002
Handling connection for 9002
Handling connection for 9002
E0803 11:25:30.463145    3717 portforward.go:406] an error occurred forwarding 9002 -> 9002: error forwarding port 9002 to pod e564df2c676121af49a8943d3d42e3fce5a91d19a73cf7f21029f6fbf94fce4f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-9bca40bb-8d26-4026-3513-f86b5fc9cf54": read tcp4 127.0.0.1:56938->127.0.0.1:9002: read: connection reset by peer
E0803 11:25:30.463161    3717 portforward.go:406] an error occurred forwarding 9002 -> 9002: error forwarding port 9002 to pod e564df2c676121af49a8943d3d42e3fce5a91d19a73cf7f21029f6fbf94fce4f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-9bca40bb-8d26-4026-3513-f86b5fc9cf54": read tcp4 127.0.0.1:56940->127.0.0.1:9002: read: connection reset by peer
E0803 11:25:30.463219    3717 portforward.go:406] an error occurred forwarding 9002 -> 9002: error forwarding port 9002 to pod e564df2c676121af49a8943d3d42e3fce5a91d19a73cf7f21029f6fbf94fce4f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-9bca40bb-8d26-4026-3513-f86b5fc9cf54": read tcp4 127.0.0.1:56936->127.0.0.1:9002: read: connection reset by peer
E0803 11:25:30.466160    3717 portforward.go:234] lost connection to pod

GMS log:

Copy code

04:24:13.255 [Thread-269] WARN  org.elasticsearch.client.RestClient:65 - request [HEAD <http://datahub-elastic:9200/datahub_usage_event?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
04:24:13.257 [Thread-269] WARN  org.elasticsearch.client.RestClient:65 - request [HEAD <http://datahub-elastic:9200/datahub_usage_event?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
04:24:13.259 [Thread-269] WARN  org.elasticsearch.client.RestClient:65 - request [HEAD <http://datahub-elastic:9200/datahub_usage_event?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
04:24:13.261 [Thread-270] WARN  org.elasticsearch.client.RestClient:65 - request [HEAD <http://datahub-elastic:9200/datahub_usage_event?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
04:24:13.265 [Thread-270] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://datahub-elastic:9200/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
04:24:55.132 [pool-4-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://datahub-elastic:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

incalculable-ocean-74010

08/03/2022, 10:04 AM

Hello @famous-florist-7218 Is the frontend pod crashing frequently, performing heavy operations or blocked by downstream requests (I.e: A request to GMS that takes a long time to process)? If yes that might be why, it could also very well be that your EKS deployment is unstable. This is possible if you are using spot VMs. When port-forwarding Kubernetes expects that the underlying pods to which it connects keep connectivity alive using heart beats. If the frontend pod is blocked for some reason and does not ping the kubernetes process running the port-forward then the connection will be reset.

famous-florist-7218

08/03/2022, 10:09 AM

@incalculable-ocean-74010 You’re right. Port-forwarding on Kubernetes used to debug. It doesn’t alive for hours, we need a heartbeats checking. In my case, I do some ingress configurations to access directly to frontend service.

incalculable-ocean-74010

08/03/2022, 10:10 AM

What type of debugging are you doing? If it’s some sort of development I would suggest doing it locally

famous-florist-7218

08/03/2022, 10:20 AM

I’ve checked the pod’s statistic, services log…then, I found that the service type of

datahub-datahub-frontend

was

LoadBalancer

(this is default value). Because I used ingress config from our devops-infra that already had an ALB within our VPC, so I had to change the service type to

ClusterIP

incalculable-ocean-74010

08/03/2022, 10:48 AM

So it’s fixed then?

famous-florist-7218

08/03/2022, 10:49 AM

Yup. It works now.

famous-florist-7218

08/03/2022, 11:25 AM

@incalculable-ocean-74010 Do you know the reason why DataHub UI doesn’t load anything? I’ve setup a bigquery-integration, its job runs successfully. But the UI shows nothing. I’ve checked the metadata store, and found that bigguery metadata is loaded.

better-orange-49102

08/03/2022, 11:27 AM

You could query ES as well and see if the dataset index has indexed the data

famous-florist-7218

08/03/2022, 11:52 AM

Thanks @better-orange-49102 Please find the log below.

Copy code

❯ curl -X GET '<http://localhost:9200/_cat/indices?v>'
health status index                                                    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   datajobindex_v2                                          CTKtCBXiQri0zJ4RyPz5MA   1   1          0            0       226b           226b
yellow open   dataset_datasetprofileaspect_v1                          CHCe3MrAS1iLmDR8T5ba7g   1   1          0            0       226b           226b
yellow open   datahubsecretindex_v2                                    ajqMSDukR9y7nS3yC1uoZw   1   1          0            0       226b           226b
yellow open   mlmodelindex_v2                                          qgsvDBIBS2Sapa9XogCQ7Q   1   1          0            0       226b           226b
yellow open   dataflowindex_v2                                         _ART1_MLRyOMZzZBCEuN2w   1   1          0            0       226b           226b
yellow open   mlmodelgroupindex_v2                                     mRUt9jfCSX6EDL3k2NVbuQ   1   1          0            0       226b           226b
yellow open   assertionindex_v2                                        HpcWN0NUQNqkGL43qITaHg   1   1          0            0       226b           226b
yellow open   datahubpolicyindex_v2                                    oLSKHafuREWt5F1fHjngoQ   1   1          5            0     10.8kb         10.8kb
yellow open   corpuserindex_v2                                         Fs6M5PP_T5KatRjc85B0mQ   1   1          0            0       226b           226b
yellow open   dataprocessindex_v2                                      ZIapXodMT2u5zqgDbppi2A   1   1          0            0       226b           226b
yellow open   chartindex_v2                                            owDAa5-jQYmwaayTUU6rFA   1   1          0            0       226b           226b
green  open   .geoip_databases                                         Ak23V0siScOyHyvgfNlaag   1   0         39            0     36.9mb         36.9mb
yellow open   tagindex_v2                                              PoVLS4iwT2qlVgEDHkNocQ   1   1          0            0       226b           226b
yellow open   mlmodeldeploymentindex_v2                                wfUIqOmUQhmrZbDX4TLOkA   1   1          0            0       226b           226b
yellow open   datahubexecutionrequestindex_v2_1659499608176            37-GXTaqRY6TxnVdxao8rw   1   1          2            0     25.5kb         25.5kb
yellow open   datajob_datahubingestioncheckpointaspect_v1              F-AHPM6YS7Co4cvFj_jbBA   1   1          0            0       226b           226b
yellow open   dataplatforminstanceindex_v2                             CuWdhxOzQPuM13F_eTN18Q   1   1          0            0       226b           226b
yellow open   dashboardindex_v2                                        lOtjY4QxTXmXmK5zJANBiQ   1   1          0            0       226b           226b
yellow open   assertion_assertionruneventaspect_v1                     eISPYtlxTta3XQFSlhXNZg   1   1          0            0       226b           226b
yellow open   datasetindex_v2                                          W0vLWGoKR5yxCOZa_VGxVA   1   1          0            0       226b           226b
yellow open   telemetryindex_v2                                        wMZWwzzbSPKsyybqIXh6zA   1   1          0            0       226b           226b
yellow open   mlfeatureindex_v2                                        xgBueZI4QZmjPN84nFVH1g   1   1          0            0       226b           226b
yellow open   dashboard_dashboardusagestatisticsaspect_v1              GhqSfGj-QCy7G6EnEcKApw   1   1          0            0       226b           226b
yellow open   datajob_datahubingestionrunsummaryaspect_v1              0w75SGLKS4S9LDQu-5CJhg   1   1          0            0       226b           226b
yellow open   dataplatformindex_v2                                     DwNhZVMVRACsQN1wCZCyyQ   1   1          0            0       226b           226b
yellow open   datahub_usage_event                                      qoGryfmxRZyGVbqqD3F2hA   1   1         12            0     50.4kb         50.4kb
yellow open   dataprocessinstanceindex_v2                              lSxxVmwvTOG0bcRh3xFD2Q   1   1          0            0       226b           226b
yellow open   glossarynodeindex_v2                                     -5BhKJ3aS9qOIlrOQmnQ9A   1   1          0            0       226b           226b
yellow open   datahubingestionsourceindex_v2                           1sQFE-lmTWW5I0pGsYhf4A   1   1          1            0      5.6kb          5.6kb
yellow open   system_metadata_service_v1_1659499616783                 Eini-RkDTiCLvRDjpGxE9w   1   1          7            1     21.8kb         21.8kb
yellow open   invitetokenindex_v2                                      A3nGt-i7SsOUXtobCw33KA   1   1          0            0       226b           226b
yellow open   datahubretentionindex_v2                                 c9eere35TEKwywhq_SCTOA   1   1          0            0       226b           226b
yellow open   graph_service_v1                                         n8sZTBA1SmiaUqmS7Ckrww   1   1          1            0      5.9kb          5.9kb
yellow open   dataprocessinstance_dataprocessinstanceruneventaspect_v1 Q2-r6lkRRLOW5w8uQpG1WQ   1   1          0            0       226b           226b
yellow open   dataset_operationaspect_v1                               AGbE6zsUQxidpbROqHdgcA   1   1          0            0       226b           226b
yellow open   datahubaccesstokenindex_v2                               1PEEZ_C8R-e2uoYT0uBqxQ   1   1          0            0       226b           226b
yellow open   containerindex_v2                                        dau_xUgpTGiDoYE8Qsovqw   1   1          0            0       226b           226b
green  open   .tasks                                                   R-JSxdgFQlCsqLicd5bjhg   1   0          2            0     13.8kb         13.8kb
yellow open   schemafieldindex_v2                                      7puKiGsgRSas00QwK42EXg   1   1          0            0       226b           226b
yellow open   domainindex_v2                                           rDAFvjgkSFSuaj9UamcYug   1   1          0            0       226b           226b
yellow open   testindex_v2                                             F8tDjqM1T6uvsU7BN4KW_w   1   1          0            0       226b           226b
yellow open   mlfeaturetableindex_v2                                   Gsy9nYIHS8-C5yEFfDyvnQ   1   1          0            0       226b           226b
yellow open   notebookindex_v2                                         iuMC5NTIQVuqr8dPuEo3sQ   1   1          0            0       226b           226b
yellow open   datahubupgradeindex_v2                                   qRpWWS9pQiC8PdHpYfLPXA   1   1          0            0       226b           226b
yellow open   glossarytermindex_v2                                     8ImqL4QbSdiUa8weR9GGGA   1   1          0            0       226b           226b
yellow open   mlprimarykeyindex_v2                                     JMciAnulToi2YlYSESRErQ   1   1          0            0       226b           226b
yellow open   corpgroupindex_v2                                        eniE4JsRQSisF8l0H-113w   1   1          0            0       226b           226b
yellow open   dataset_datasetusagestatisticsaspect_v1                  61fZT0rsTcSsKq7k9GxNNA   1   1          0            0       226b           226b

famous-florist-7218

08/03/2022, 11:53 AM

It seems like

metadata_aspect_v2

has not been indexed in ES.

better-orange-49102

08/03/2022, 11:57 AM

Yup datasetindex_v2 is empty. Needs to be populated. GMS pod logs show anything interesting?

incalculable-ocean-74010

08/03/2022, 11:58 AM

You should be able to run the restore indices job from the datahub upgrade helm chart to force the indexing on ES

famous-florist-7218

08/03/2022, 12:03 PM

@better-orange-49102 it’s so weird 😕

Copy code

11:39:56.432 [pool-11-thread-1] ERROR c.l.d.g.a.service.AnalyticsService:264 - Search query failed: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
11:39:56.436 [pool-11-thread-1] ERROR o.s.s.s.TaskUtils$LoggingErrorHandler:95 - Unexpected error occurred in scheduled task
java.lang.RuntimeException: Search query failed:
	at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
	at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:236)
	at com.linkedin.gms.factory.telemetry.DailyReport.dailyReport(DailyReport.java:76)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
	at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
	... 15 common frames omitted
	Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://datahub-elastic:9200>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
Warnings: [[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"hjWpwRt7Tg-iDnfSq_SaCA","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
		at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
		at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
		... 19 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
	at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
	... 22 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
	... 26 common frames omitted

famous-florist-7218

08/03/2022, 12:04 PM

Thanks @incalculable-ocean-74010, I’m trying 😊

better-orange-49102

08/03/2022, 12:05 PM

Haven't seen logs like this before where ES complains about incorrect queries from GMS, @incalculable-ocean-74010

famous-florist-7218

08/03/2022, 12:23 PM

The restore indices job did the trick.

Copy code

# kubectl create job --from=cronjob/<<release-name>>-datahub-restore-indices-job-template datahub-restore-indices-job

Thank you so much! @better-orange-49102 &@incalculable-ocean-74010 😊

3 Views

Open in Slack

Previous Next