Hi team , I have deployed datahub and it seems UI ...
# troubleshoot
p
Hi team , I have deployed datahub and it seems UI is not able to hit the graphql api. Can someone help to figure out that?
b
from the looks of the landing page, gms has not started properly
what does the logs for gms say
l
GMS is working properly
Graphql api call is throwing 404
calling the Graphql api in GMS works properlyt
g
What's the output of the GMS logs? There's probably an exception or the service is restarting from failing the health check
l
Hi @gorgeous-dinner-4055 GMS is running fine, and the health checks are passing.
Copy code
12:44:24.271 [qtp544724190-12] INFO  c.l.parseq.TaskDescriptorFactory:44 - No provider found for TaskDescriptor, falling back to DefaultTaskDescriptor
12:44:24.332 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 206ms
12:45:01.246 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 126ms
12:45:26.809 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 117ms
12:46:50.221 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 128ms
12:50:47.513 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 196ms
13:26:34.354 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 119ms
13:26:45.843 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 151ms
13:56:32.100 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 132ms
15:44:01.707 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 116ms
15:45:18.607 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 114ms
15:46:09.892 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 116ms
15:46:10.021 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 114ms
15:46:10.146 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 116ms
15:47:17.765 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 115ms
15:47:20.670 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 114ms
15:48:20.888 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 128ms
15:48:25.749 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 128ms
15:48:25.929 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 124ms
15:48:27.464 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 125ms
15:48:29.660 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 113ms
15:51:17.433 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 -
I can access http://gms:8080/api/graphiql in our deployment
@gorgeous-dinner-4055 @better-orange-49102 any leads will be helpful 🙂 Debugging steps 1. Looked at deployment, 2. Checked GMS setup - Working fine 3. Checked Frontend Setup - Working fine, able to login 4. Redeployed the stack with latest on master - Didn’t solve GMS /api/graphql is also available via frontend proxy it is giving 404, so it looks like there is some issue in proxy calling the GMS API
5. We are running on K8s, I checked the ingress rules, they seem to be fine. - I will double check them 🙂
g
Ohh, I'm not sure about K8s 😞 One thing to check might be to log into the frontend pod, and try curling the health endpoint to make sure your network is setup correctly?
l
Copy code
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>Not Found</title>
        <style>
            html, body, pre {
                margin: 0;
                padding: 0;
                font-family: Monaco, 'Lucida Console', monospace;
                background: #ECECEC;
            }
            h1 {
                margin: 0;
                background: #AD632A;
                padding: 20px 45px;
                color: #fff;
                text-shadow: 1px 1px 1px rgba(0,0,0,.3);
                border-bottom: 1px solid #9F5805;
                font-size: 28px;
            }
            p#detail {
                margin: 0;
                padding: 15px 45px;
                background: #F6A960;
                border-top: 4px solid #D29052;
                color: #733512;
                text-shadow: 1px 1px 1px rgba(255,255,255,.3);
                font-size: 14px;
                border-bottom: 1px solid #BA7F5B;
            }
        </style>
    </head>
    <body>
        <h1>Not Found</h1>

        <p id="detail">
            For request 'POST /v2/api/graphql'
        </p>

    </body>
</html>
Called from the frontend pod curl -X POST ‘localhost:9002/v2/api/graphql’ with the payload
curl localhost:9002/admin - returns GOOD
g
try curl http://gms:8080/health from your frontend pod the above are calling the frontend service
l
Copy code
HTTP/1.1 200 OK
Date: Wed, 23 Mar 2022 16:28:50 GMT
Content-Length: 0
Connection: keep-alive
server: envoy
x-envoy-upstream-service-time: 4
got a 200
g
Drats, that's where my knowledge ends sadly 😞
l
Ah, thank you @gorgeous-dinner-4055 🙂 let’s try more and maybe someone else can help
This is a very strange issue, seems like the mapping is not working properly or something in frontend didn’t initialise correctly
Posting more Config for frontend /config endpoint
Copy code
{
  "status": "ok",
  "config": {
    "application": "datahub-frontend",
    "appVersion": "1.0",
    "isInternal": false,
    "shouldShowDatasetLineage": true,
    "suggestionConfidenceThreshold": 50,
    "wikiLinks": {
      "appHelp": "<https://github.com/datahub-project/datahub>",
      "gdprPii": "",
      "tmsSchema": "",
      "gdprTaxonomy": "",
      "staleSearchIndex": "",
      "dht": "",
      "purgePolicies": "",
      "jitAcl": "",
      "metadataCustomRegex": "",
      "exportPolicy": "",
      "metadataHealth": "",
      "purgeKey": "",
      "datasetDecommission": ""
    },
    "tracking": {
      "trackers": {
        "piwik": {
          "piwikSiteId": 0,
          "piwikUrl": ""
        }
      },
      "isEnabled": true
    },
    "isStagingBanner": false,
    "isLiveDataWarning": false,
    "showChangeManagement": false,
    "showPeople": true,
    "changeManagementLink": "",
    "isStaleSearch": true,
    "showAdvancedSearch": true,
    "useNewBrowseDataset": true,
    "showLineageGraph": true,
    "showInstitutionalMemory": true,
    "userEntityProps": {
      "aviUrlPrimary": "",
      "aviUrlFallback": ""
    }
  }
}
warning at start
Copy code
12:41:15 [application-akka.actor.default-dispatcher-3] INFO  akka.event.slf4j.Slf4jLogger - Slf4jLogger started
12:41:15 [application-akka.actor.default-dispatcher-3] WARN  akka.util.ManifestInfo - Detected possible incompatible versions on the classpath. Please note that a given Akka version MUST be the same across all modules of Akka that you are using, e.g. if you use [2.5.23] all other modules that are released together MUST be of the same version. Make sure you're using a compatible set of libraries. Possibly conflicting versions [2.5.23, 2.5.11] in libraries [akka-protobuf:2.5.23, akka-actor:2.5.23, akka-slf4j:2.5.11, akka-stream:2.5.23]
12:41:17 [main] WARN  c.l.r.t.h.client.HttpClientFactory - No scheduled executor is provided to HttpClientFactory, using it's own scheduled executor.
12:41:17 [main] WARN  c.l.r.t.h.client.HttpClientFactory - No callback executor is provided to HttpClientFactory, using it's own call back executor.
12:41:17 [main] WARN  c.l.r.t.h.client.HttpClientFactory - No Compression executor is provided to HttpClientFactory, using it's own compression executor.
12:41:17 [main] INFO  c.l.r.t.h.client.HttpClientFactory - The service 'null' has been assigned to the ChannelPoolManager with key 'noSpecifiedNamePrefix 1138266797 ', http.protocolVersion=HTTP_1_1, usePipelineV2=false, requestTimeout=10000ms, streamingTimeout=-1ms
[WARN] [03/23/2022 12:41:17.519] [main] [ManifestInfo(<akka://proxyClient>)] Detected possible incompatible versions on the classpath. Please note that a given Akka version MUST be the same across all modules of Akka that you are using, e.g. if you use [2.5.23] all other modules that are released together MUST be of the same version. Make sure you're using a compatible set of libraries. Possibly conflicting versions [2.5.23, 2.5.11] in libraries [akka-protobuf:2.5.23, akka-actor:2.5.23, akka-slf4j:2.5.11, akka-stream:2.5.23
o
Does the graphiql endpoint work at all? Or the openapi one? Also what version do you have deployed?
You can click either one from the dropdown in the UI by hovering over the profile picture on the top right
l
Graphql endpoints works from GMS service,
o
I mean from the frontend routing
l
None of them work from UI (Graphiql or OpenAPI)
👍 1
ERR_INVALID_RESPONSE
o
Still the same 404?
l
Yes, same 404
Don’t see anything in debug logs
Copy code
12:41:17 [main] DEBUG c.l.r.t.h.client.HttpClientFactory - Getting a client with configuration {http.requestTimeout=10000} and SSLContext null
o
Hmm, on a frontend pod can you examine the environment variable set for
DATAHUB_GMS_HOST
and
DATAHUB_GMS_PORT
?
and make sure it aligns with the service name for GMS
l
let me check again
So it is set correctly, and I can also call config from frontend pod using curl DATAHUB_GMS_HOST:DATAHUB_GMS_PORT/config
🤔 1
o
On the frontend pod looks like you hit the endpoint:
localhost:9002/v2/api/graphql
did you also try:
localhost:9002/api/graphiql
and
locahost:9002/openapi/swagger-ui/index.html
?
l
So we were able to solve this finally 🙂 Connection from Proxy to GMS was via a Ambassador endpoint which stripped all the headers. We were able to use AWS LB, and that solved the issue. @polite-orange-57255 @gifted-kite-59905
🎉 1
Thanks @orange-night-91387 @gorgeous-dinner-4055 for the support 🙂
@polite-orange-57255 @gifted-kite-59905 let’s work with community, see where we can document our learnings running Datahub on K8s
🙌 3