Hi, I’m trying to ingest some metadata from Snowfl...
# troubleshoot
b
Hi, I’m trying to ingest some metadata from Snowflake, however facing the following exceptions. Could someone help?
Copy code
datahub-gms 2023-05-18 09:42:42,216 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 6 Took time ms: -1 Message: failure in bulk execution:
datahub-gms [1]: index [datahubstepstateindex_v2], type [_doc], id [urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-filters], message [[datahubstepstateindex_v2/VMzXqnXeSnWqzq_cS6B4XA][[datahubstepstateindex_v
2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-filters]: document missing]]]
datahub-gms [4]: index [datahubstepstateindex_v2], type [_doc], id [urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-advanced-search], message [[datahubstepstateindex_v2/VMzXqnXeSnWqzq_cS6B4XA][[datahubstepstat
eindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-advanced-search]: document missing]]]
Version: 0.10.2 Helmchart version: datahub-0.2.164 Note that I have to disable
datahubUpgrade
since it keeps checking health of
datahub-gms
and failing.
1
f
Hey @bland-gigabyte-28270, why don’t you try to restore elasticesearch-indices? https://datahubproject.io/docs/how/restore-indices/
b
To be able to do the re-indexing,
datahubUpgrade
needs to be deployed I believe, however it’s not doable:
Copy code
ERROR: Cannot connect to GMSat host datahub-datahub-gms port 8080. Make sure GMS is on the latest version and is running at that host before starting the migration.
java.net.ConnectException: Connection refused (Connection refused)
can confirm that
datahub-gms
is healthy
Also,
Testing your connection
feature keeps hanging.
f
Pretty sure that GMS wasn’t reachable. You should deploy a busy box in the same namespace where datahub system is deployed. Then try to ping to GMS.
b
Seems to me that it’s reachable:
Copy code
bash-5.1$ curl <http://datahub-datahub-gms:8080>
{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:202)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:254)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:228)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.RestliHandlerServlet.service(RestliHandlerServlet.java:21)\n\tat com.linkedin.restli.server.RestliHandlerServlet.handleRequest(RestliHandlerServlet.java:26)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)\n\tat org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)\n\tat com.datahub.auth.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:98)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)\n\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: com.linkedin.restli.server.RoutingException\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:111)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:181)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:224)\n\t... 69 more\n","status":404}
From `datahub-gms`:
Copy code
istio-proxy [2023-05-18T10:21:39.893Z] "GET / HTTP/1.1" 404 - via_upstream - "-" 0 7188 3 3 "-" "curl/7.80.0" "ef4a1cc2-32f2-4fa1-a9e4-2c29a29a62dc" "datahub-datahub-gms:8080" "10.20.203.252:8080" inbound|8080|| 127.0.0.6:58169 10.2
The
istio-proxy
is a rule defined by our company, let me see if we can bypass it Nothing, still same issue
OK I have executed the indices:
Copy code
Cleanup has not been requested.
Skipping Step 1/3: ClearSearchServiceStep...
Cleanup has not been requested.
Skipping Step 2/3: ClearGraphServiceStep...
Executing Step 3/3: SendMAEStep...
Sending MAE from local DB
Found 159 latest aspects in aspects table in 0.01 minutes.
Args are RestoreIndicesArgs(start=0, batchSize=1000, numThreads=1, batchDelayMs=100, aspectName=null, urn=null, urnLike=null)
Reading rows 0 through 1000 from the aspects table started.
Reading rows 0 through 1000 from the aspects table completed.
metrics so far RestoreIndicesResult(ignored=0, rowsMigrated=159, timeSqlQueryMs=89, timeGetRowMs=0, timeUrnMs=95, timeEntityRegistryCheckMs=0, aspectCheckMs=0, createRecordMs=304, sendMessageMs=6395)
Successfully sent MAEs for 159/159 rows (100.00% of total). 0 rows ignored (0.00% of total)
0.13 mins taken. 0.00 est. mins to completion. Total mins est. = 0.13.
Completed Step 3/3: SendMAEStep successfully.
Success! Completed upgrade with id RestoreIndices successfully.
Upgrade RestoreIndices completed with result SUCCEEDED. Exiting...
However executing the connection test still fails 😞
Copy code
2023-05-18 11:13:24,748 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 1 Took time ms: -1 Message: failure in bulk execution:
[0]: index [datahubexecutionrequestindex_v2_1684406442764], type [_doc], id [urn%3Ali%3AdataHubExecutionRequest%3Ac4fa2f63-275e-4b27-b0e7-39f5ad72bd37], message [[datahubexecutionrequestindex_v2_1684406442764/pJu3izVZSLSSd_VxYYS_HA][[da
tahubexecutionrequestindex_v2_1684406442764][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubExecutionRequest%3Ac4fa2f63-275e-4b27-b0e7-39f5ad72bd37]: document missing
]]]
Thanks for the previous help, after a while I realized it’s because
actions
is not properly configured. Now I have the following issues, can someone help?
Copy code
│ acryl-datahub-actions ~~~~ Execution Summary - RUN_INGEST ~~~~                                                                                                                                                                           │
│ acryl-datahub-actions Execution finished with errors.                                                                                                                                                                                    │
│ acryl-datahub-actions {'exec_id': '9ad7a7a8-d93f-4201-bfb3-d637294779fb',                                                                                                                                                                │
│ acryl-datahub-actions  'infos': ['2023-05-18 12:55:06.316483 INFO: Starting execution for task with name=RUN_INGEST',                                                                                                                    │
│ acryl-datahub-actions            '2023-05-18 12:55:06.325822 INFO: Caught exception EXECUTING task_id=9ad7a7a8-d93f-4201-bfb3-d637294779fb, name=RUN_INGEST, '                                                                           │
│ acryl-datahub-actions            'stacktrace=Traceback (most recent call last):\n'                                                                                                                                                       │
│ acryl-datahub-actions            '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'                                                                            │
│ acryl-datahub-actions            '    task_event_loop.run_until_complete(task_future)\n'                                                                                                                                                 │
│ acryl-datahub-actions            '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'                                                                                                          │
│ acryl-datahub-actions            '    return future.result()\n'                                                                                                                                                                          │
│ acryl-datahub-actions            '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 82, in execute\n'                                                                        │
│ acryl-datahub-actions            '    full_log_file = open(f"{self.config.log_dir}/ingestion-{exec_id}.txt", "w")\n'                                                                                                                     │
│ acryl-datahub-actions            "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/datahub/logs/ingestion-9ad7a7a8-d93f-4201-bfb3-d637294779fb.txt'\n"],                                                                    │
│ acryl-datahub-actions  'errors': []}
Fixed by using datahub system user