few-sunset-43876
11/21/2022, 3:27 AM03:16:05.440 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 2 Took time ms: -1
03:16:05.663 [ThreadPoolTaskExecutor-1] INFO c.l.m.k.t.DataHubUsageEventTransformer:74 - Invalid event type: SearchAcrossLineageResultsViewEvent
03:16:05.663 [ThreadPoolTaskExecutor-1] WARN c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"SearchAcrossLineageResultsViewEvent","query":"","total":10,"actorUrn":"urn:li:corpuser:datahub","timestamp":1669000565516,"date":"Mon Nov 21 2022 10:16:05 GMT+0700 (Indochina Time)","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36","browserId":"57f357cc-cdf7-4104-a7fa-30d8eda4f486"}
03:16:06.447 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ThreadPoolTaskScheduler-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | producer-1"
and the logs from datahub-frontend-react:
2022-11-21 03:17:04,148 [application-akka.actor.default-dispatcher-13] ERROR application -
! @7pkjpoecp - Internal server error, for (POST) [/api/v2/graphql] ->
play.api.UnexpectedException: Unexpected exception[CompletionException: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:340)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:263)
at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:443)
at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:441)
at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:417)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:92)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:92)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:21)
at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:18)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:67)
at scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:82)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:59)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:875)
at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:110)
at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:107)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:873)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
at scala.concurrent.Promise.complete(Promise.scala:53)
at scala.concurrent.Promise.complete$(Promise.scala:52)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
at scala.concurrent.Promise.failure(Promise.scala:104)
at scala.concurrent.Promise.failure$(Promise.scala:104)
at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:187)
at play.libs.ws.ahc.StandaloneAhcWSClient$ResponseAsyncCompletionHandler.onThrowable(StandaloneAhcWSClient.java:227)
at play.shaded.ahc.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:278)
at play.shaded.ahc.org.asynchttpclient.netty.request.NettyRequestSender.abort(NettyRequestSender.java:473)
at play.shaded.ahc.org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
at play.shaded.ahc.org.asynchttpclient.netty.timeout.ReadTimeoutTimerTask.run(ReadTimeoutTimerTask.java:56)
at play.shaded.ahc.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:670)
at play.shaded.ahc.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:745)
at play.shaded.ahc.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:473)
at play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms
... 7 common frames omitted
the stats of the containers
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
52616fa99479 datahub-frontend-react 0.59% 523.7MiB / 31.26GiB 1.64% 598kB / 619kB 0B / 0B 52
d72c1d91089c datahub_datahub-actions_1 0.06% 50.66MiB / 31.26GiB 0.16% 295MB / 181MB 5.46MB / 0B 24
805489e4533c datahub-gms 748.15% 1.754GiB / 31.26GiB 5.61% 316MB / 3.25MB 0B / 0B 127
69761ab51fcc schema-registry 0.21% 520.5MiB / 31.26GiB 1.63% 104MB / 99.2MB 6.14MB / 12.3kB 49
34814372e50d broker 0.88% 508.4MiB / 31.26GiB 1.59% 957MB / 977MB 13.3MB / 801MB 89
30a6648fdbd5 elasticsearch 0.98% 932.2MiB / 31.26GiB 2.91% 26.5MB / 27.6MB 34.1MB / 178MB 134
bbef225eadba zookeeper 0.22% 358MiB / 31.26GiB 1.12% 20MB / 12MB 451kB / 188kB 67
9a83d87163a1 mysql 0.06% 348MiB / 31.26GiB 1.09% 63.7MB / 301MB 14.9MB / 26.1MB 33
e0d367b11df2 neo4j 0.59% 1.609GiB / 31.26GiB 5.15% 17.3MB / 926MB 1.47GB / 26.1MB 78
The java heap size of datahub-gms
bash-5.1$ java -XX:+PrintFlagsFinal -version | grep HeapSize
size_t ErgoHeapSizeLimit = 0 {product} {default}
size_t HeapSizePerGCThread = 43620760 {product} {default}
size_t InitialHeapSize = 526385152 {product} {ergonomic}
size_t LargePageHeapSizeThreshold = 134217728 {product} {default}
size_t MaxHeapSize = 8392802304 {product} {ergonomic}
uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic}
uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-alpine-r3)
OpenJDK 64-Bit Server VM (build 11.0.17+8-alpine-r3, mixed mode)
datahub-gms container with free command:
docker exec -it datahub-gms bash
bash-5.1$ free
total used free shared buff/cache available
Mem: 32776400 8052724 417880 0 24305796 24294940
Swap: 4194300 3584 4190716
The application is deploy in GCP, the stats of VM:
cat /proc/meminfo
MemTotal: 32776400 kB
MemFree: 306556 kB
MemAvailable: 24412316 kB
Buffers: 2212 kB
Cached: 23913504 kB
SwapCached: 124 kB
Active: 15746384 kB
Inactive: 15120120 kB
Active(anon): 5049788 kB
Inactive(anon): 1926800 kB
Active(file): 10696596 kB
Inactive(file): 13193320 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 4191228 kB
Dirty: 84 kB
Writeback: 0 kB
AnonPages: 6950912 kB
Mapped: 309100 kB
Shmem: 25800 kB
Slab: 885396 kB
SReclaimable: 618596 kB
SUnreclaim: 266800 kB
KernelStack: 18816 kB
PageTables: 30292 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 20582500 kB
Committed_AS: 13568028 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 63820 kB
VmallocChunk: 34359661428 kB
Percpu: 5760 kB
HardwareCorrupted: 0 kB
AnonHugePages: 2617344 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 103232 kB
DirectMap2M: 5136384 kB
DirectMap1G: 30408704 kB
The production with older version V0.8.24 didn't have this OOM issue. It happens after upgrading to v0.9.2.
I upgrade the new version using command from docker-compose.yml of version v0.9.2
docker-compose down --remove-orphans && docker-compose pull && docker-compose -p datahub up --force-recreate
Is there anything I need to check or adjust (reindexing or something...)? Any help would be appreciated.few-sunset-43876
11/21/2022, 4:32 AMdocker exec -it datahub-gms bash
bash-5.1$ free
total used free shared buff/cache available
Mem: 32775184 12033140 9027096 0 11714948 19968312
Swap: 4194300 1229480 2964820
my VM:
docker exec -it datahub-gms bash
bash-5.1$ free
total used free shared buff/cache available
Mem: 32775184 12033140 9027096 0 11714948 19968312
Swap: 4194300 1229480 2964820
Is it relevant? Do I need to clear the cache memory?few-sunset-43876
11/21/2022, 12:37 PMCONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
805489e4533c datahub-gms 748.15% 1.754GiB / 31.26GiB 5.61% 316MB / 3.25MB 0B / 0B 127
Thank you so much for your support!bulky-soccer-26729
11/21/2022, 3:27 PMbulky-soccer-26729
11/21/2022, 3:31 PMfew-sunset-43876
11/21/2022, 5:01 PMbulky-soccer-26729
11/21/2022, 6:33 PMbrainy-tent-14503
11/22/2022, 2:48 AMJAVA_OPTS environment variable and the -Xmx / -Xms options. These can be set in the helm chart values by setting this env. I can’t guarantee that it’ll work but at least it should be able to use more of the memory you’ve allocated for the pod.
datahub-gms:
extraEnvs:
- name: JAVA_OPTS
value: -Xmx28g -Xms16gfew-sunset-43876
11/22/2022, 2:49 AMbrainy-tent-14503
11/22/2022, 2:50 AMfew-sunset-43876
11/22/2022, 2:51 AMbrainy-tent-14503
11/22/2022, 2:52 AMbrainy-tent-14503
11/22/2022, 2:53 AMbrainy-tent-14503
11/22/2022, 2:55 AMJAVA_OPTS=-Xms1g -Xmx1gbrainy-tent-14503
11/22/2022, 2:55 AM1.754GiB / 31.26GiB that but maybe with a little off-heap usagebrainy-tent-14503
11/22/2022, 2:58 AMfew-sunset-43876
11/22/2022, 3:01 AMbash-5.1$ java -XX:+PrintFlagsFinal -version | grep HeapSize
size_t ErgoHeapSizeLimit = 0 {product} {default}
size_t HeapSizePerGCThread = 43620760 {product} {default}
size_t InitialHeapSize = 526385152 {product} {ergonomic}
size_t LargePageHeapSizeThreshold = 134217728 {product} {default}
size_t MaxHeapSize = 8392802304 {product} {ergonomic}
uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic}
uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-alpine-r3)
OpenJDK 64-Bit Server VM (build 11.0.17+8-alpine-r3, mixed mode)brainy-tent-14503
11/22/2022, 3:03 AMbrainy-tent-14503
11/22/2022, 3:03 AMbrainy-tent-14503
11/22/2022, 3:05 AMfew-sunset-43876
11/22/2022, 3:19 AMfew-sunset-43876
11/22/2022, 4:26 AMENV JMX_OPTS=""
ENV JAVA_OPTS="-Xms20g -Xmx20g"
and re-create the containers (in docker folder, run)
docker-compose down --remove-orphans && docker-compose pull && docker-compose -p datahub up --force-recreate
but the size didn't not change, it's still 8g
[root@datahub-preprod-v2 datahub-gms]# docker exec -it datahub-gms bash
bash-5.1$ java -XX:+PrintFlagsFinal -version | grep HeapSize
size_t ErgoHeapSizeLimit = 0 {product} {default}
size_t HeapSizePerGCThread = 43620760 {product} {default}
size_t InitialHeapSize = 526385152 {product} {ergonomic}
size_t LargePageHeapSizeThreshold = 134217728 {product} {default}
size_t MaxHeapSize = 8392802304 {product} {ergonomic}
uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic}
uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-alpine-r3)
OpenJDK 64-Bit Server VM (build 11.0.17+8-alpine-r3, mixed mode)
Is there anywhere else need to be modified?
ps: the docker-compose.yml is the default of v0.9.2
https://github.com/datahub-project/datahub/blob/master/docker/docker-compose.ymlbrainy-tent-14503
11/22/2022, 4:39 AMenvironment: and following line.
datahub-gms:
build:
context: ../
dockerfile: docker/datahub-gms/Dockerfile
image: ${DATAHUB_GMS_IMAGE:-linkedin/datahub-gms}:${DATAHUB_VERSION:-head}
hostname: datahub-gms
container_name: datahub-gms
environment:
- JAVA_OPTS=-Xms20g -Xmx20g
ports:
- ${DATAHUB_MAPPED_GMS_PORT:-8080}:8080
depends_on:
- elasticsearch-setup
- kafka-setup
- mysql
- neo4jfew-sunset-43876
11/22/2022, 4:52 AMdatahub-gms:
build:
context: ../
dockerfile: docker/datahub-gms/Dockerfile
image: ${DATAHUB_GMS_IMAGE:-linkedin/datahub-gms}:${DATAHUB_VERSION:-head}
hostname: datahub-gms
container_name: datahub-gms
ports:
- ${DATAHUB_MAPPED_GMS_PORT:-8080}:8080
environment:
- JAVA_OPTS=-Xms20g -Xmx20g
depends_on:
- elasticsearch-setup
- kafka-setup
- mysql
- neo4j
docker exec -it datahub-gms bash
bash-5.1$ java -XX:+PrintFlagsFinal -version | grep HeapSize
size_t ErgoHeapSizeLimit = 0 {product} {default}
size_t HeapSizePerGCThread = 43620760 {product} {default}
size_t InitialHeapSize = 526385152 {product} {ergonomic}
size_t LargePageHeapSizeThreshold = 134217728 {product} {default}
size_t MaxHeapSize = 8392802304 {product} {ergonomic}
uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic}
uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}few-sunset-43876
11/22/2022, 4:57 AM04:55:44.028 [pool-12-thread-1] INFO c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 6ms
04:55:44.592 [pool-12-thread-1] INFO c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 27ms
04:55:45.367 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 2 Took time ms: -1
04:55:46.192 [ThreadPoolTaskExecutor-1] INFO c.l.m.k.t.DataHubUsageEventTransformer:74 - Invalid event type: SearchAcrossLineageResultsViewEvent
04:55:46.193 [ThreadPoolTaskExecutor-1] WARN c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"SearchAcrossLineageResultsViewEvent","query":"","total":10,"actorUrn":"urn:li:corpuser:datahub","timestamp":1669092946065,"date":"Tue Nov 22 2022 11:55:46 GMT+0700 (Indochina Time)","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36","browserId":"57f357cc-cdf7-4104-a7fa-30d8eda4f486"}
04:55:46.398 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1few-sunset-43876
11/25/2022, 3:18 AMbash-5.1$ java -XX:+PrintFlagsFinal -version | grep HeapSize
size_t ErgoHeapSizeLimit = 0 {product} {default}
size_t HeapSizePerGCThread = 43620760 {product} {default}
size_t InitialHeapSize = 526385152 {product} {ergonomic}
size_t LargePageHeapSizeThreshold = 134217728 {product} {default}
size_t MaxHeapSize = 8392802304 {product} {ergonomic}
uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic}
uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}brainy-tent-14503
11/25/2022, 3:07 PM$ docker container ls
Find the container id running the datahub-gms image.
Next run this command with the container id.
docker exec <container id> ps -ef
You should see something the following and I would expect to see your settings with 20g. This would at least verify that the java instance running gms has the higher memory that we are intending to set.
13 datahub 37:47 java -Xms1g -Xmx1g -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.warfew-sunset-43876
11/28/2022, 9:06 AMdocker exec <container id> ps -ef
Thank you again!