gifted-knife-16120
11/16/2022, 2:44 AMdatahub
user? or we can disable datahub. it is not passed for our security requirementsgifted-knife-16120
11/16/2022, 3:45 AMgreen-hamburger-3800
11/16/2022, 8:41 AMsteep-fountain-54482
11/16/2022, 8:48 AMsteep-fountain-54482
11/16/2022, 8:48 AMsteep-fountain-54482
11/16/2022, 8:49 AMsteep-fountain-54482
11/16/2022, 8:49 AMval lineage = new UpstreamLineage()
lineage.setFineGrainedLineages(new FineGrainedLineageArray(lineages.asJava))
lineage.setUpstreams(new UpstreamArray(upstreams.asJava))
green-hamburger-3800
11/16/2022, 9:05 AMfresh-cricket-75926
11/16/2022, 9:32 AMbillowy-pilot-93812
11/16/2022, 10:22 AMfull-salesclerk-85947
11/16/2022, 11:14 AMFetching docker-compose file <https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml> from GitHub
Pulling docker images...
Finished pulling docker images!
[+] Running 7/7
⠿ Network datahub_network Created 4.1s
⠿ Container elasticsearch Created 0.1s
⠿ Container zookeeper Created 0.1s
⠹ Container mysql Creating 0.2s
⠿ Container elasticsearch-setup Created 0.0s
⠿ Container broker Created 0.1s
⠿ Container schema-registry Created 0.0s
⠿ Container kafka-setup Created 0.0s
Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /Users/<user>/.datahub/mysql/init.sql
.............
[+] Running 0/0
⠋ Container mysql Creating 0.0s
Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /Users/<user>/.datahub/mysql/init.sql
..............
green-hamburger-3800
11/16/2022, 11:45 AMmicroscopic-mechanic-13766
11/16/2022, 12:47 PMException in thread "map-output-dispatcher-3" java.lang.UnsatisfiedLinkError: com.github.luben.zstd.Zstd.setCompressionLevel(JI)I
I am building the jar with the following command ./gradlew metadata-integration:java:spark-lineage:buildDependents
, and the source code I have started from is the code of v0.9.1 tag. The modifications done do not modify yet the base functionality, they are just more log prints.
The application submitted worked with the v0.9.2 of the spark-lineage plugin, so I am guessing that this error could have been some step that I have unconsciously skipped in the build process.
(I skipped the tests as I was stuck with another error and as it was a checking of the docker deployment, I didn't gave it much importance. Could that be it??)mysterious-motorcycle-80650
11/16/2022, 11:00 PMmicroscopic-room-90690
11/17/2022, 10:52 AMERROR {datahub.ingestion.run.pipeline:112} - failed to write record with workunit <s3://path/data-lake-dbt/cdm_dim/users_snapshot9996> with Expecting value: line 1 column 1 (char 0) and info {}
Any help will be appreciated. Thank you!late-ability-59580
11/17/2022, 1:47 PM<s3://bucket/pref/pref/*/*>
in the source.config.path_specs.include
I understand it expects something like s3://.../*.*
, but this won't match the pattern of my files.
Am I missing something?green-hamburger-3800
11/17/2022, 2:10 PMmicroscopic-mechanic-13766
11/17/2022, 3:15 PMbright-motherboard-35257
11/17/2022, 4:16 PMERROR {datahub.entrypoints:206} - Command failed: Password should be set if and only if in LDAP or CUSTOM
Recipe:
source:
type: hive
config:
database: hc_orders
profiling:
enabled: true
host_port: '<redacted>:10000'
stateful_ingestion:
enabled: true
username: <redacted>
password: <redacted>
options:
connect_args:
auth: KERBEROS
kerberos_service_name: hive
ancient-apartment-23316
11/17/2022, 9:26 PMfew-sunset-43876
11/18/2022, 3:20 AMversion: '3.8'
services:
...
neo4j:
image: neo4j:4.4.9-community
env_file: neo4j/env/docker.env
hostname: neo4j
container_name: neo4j
ports:
- ${DATAHUB_MAPPED_NEO4J_HTTP_PORT:-7474}:7474
- ${DATAHUB_MAPPED_NEO4J_BOLT_PORT:-7687}:7687
volumes:
- neo4jdata:/data
...
networks:
default:
name: datahub_network
volumes:
esdata:
neo4jdata:
zkdata:
broker:
neo4j/env/docker.env
NEO4J_AUTH=neo4j/datahub
NEO4J_dbms_default__database=graph.db
NEO4J_dbms_allow__upgrade=true
But the neo4j container could not start due to the message:
Changed password for user 'neo4j'. IMPORTANT: this change will only take effect if performed before the database is started for the first time.
Could anyone help?
Thank you so much in advance!famous-florist-7218
11/18/2022, 8:49 AM{{- if and .Values.serviceMonitor.create .Values.global.datahub.monitoring.enablePrometheus -}}
apiVersion: <http://monitoring.coreos.com/v1|monitoring.coreos.com/v1>
kind: ServiceMonitor
metadata:
name: {{ printf "%s-%s" .Release.Name "datahub-gms" }}
labels:
{{- include "datahub-gms.labels" . | nindent 4 }}
{{- with .Values.serviceMonitor.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
endpoints:
- port: jmx
relabelings:
- separator: /
sourceLabels:
- namespace
- pod
targetLabel: instance
selector:
matchLabels:
<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
<http://app.kubernetes.io/name|app.kubernetes.io/name>: datahub-gms
{{- end -}}
happy-baker-8735
11/18/2022, 11:17 AMpolite-egg-47560
11/18/2022, 12:41 PMswift-farmer-36942
11/18/2022, 5:25 PMlittle-breakfast-38102
11/18/2022, 7:46 PMbetter-spoon-77762
11/18/2022, 7:51 PMjava.util.concurrent.CompletionException: java.lang.IllegalStateException: Problem loading Database Driver [org.postgresql.Driver]: org.postgresql.Driver
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.lang.IllegalStateException: Problem loading Database Driver [org.postgresql.Driver]: org.postgresql.Driver
at io.ebean.datasource.pool.ConnectionPool.initialise(ConnectionPool.java:281)
at io.ebean.datasource.pool.ConnectionPool.<init>(ConnectionPool.java:246)
at io.ebean.datasource.core.Factory.createPool(Factory.java:15)
at io.ebeaninternal.server.core.DefaultContainer.getDataSourceFromConfig(DefaultContainer.java:273)
at io.ebeaninternal.server.core.DefaultContainer.setDataSource(DefaultContainer.java:217)
at io.ebeaninternal.server.core.DefaultContainer.createServer(DefaultContainer.java:103)
at io.ebeaninternal.server.core.DefaultContainer.createServer(DefaultContainer.java:35)
at io.ebean.EbeanServerFactory.createInternal(EbeanServerFactory.java:109)
at io.ebean.EbeanServerFactory.create(EbeanServerFactory.java:70)
at com.linkedin.metadata.entity.ebean.EbeanTenantDaoManager.getTenantDao(EbeanTenantDaoManager.java:29)
at com.linkedin.metadata.entity.EntityService.getEntityDao(EntityService.java:193)
at com.linkedin.metadata.entity.EntityService.getEnvelopedAspects(EntityService.java:1867)
at com.linkedin.metadata.entity.EntityService.getCorrespondingAspects(EntityService.java:403)
at com.linkedin.metadata.entity.EntityService.getLatestEnvelopedAspects(EntityService.java:356)
at com.linkedin.metadata.entity.EntityService.getEntitiesV2(EntityService.java:310)
at com.linkedin.metadata.client.JavaEntityClient.batchGetV2(JavaEntityClient.java:114)
at com.linkedin.datahub.graphql.resolvers.MeResolver.lambda$get$0(MeResolver.java:57)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
... 6 common frames omitted
Caused by: java.lang.ClassNotFoundException: org.postgresql.Driver
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at io.ebean.datasource.pool.ConnectionPool.initialise(ConnectionPool.java:276)
I wonder if its related to having both openjdk8 and openjdk11 in the gms build ?
I have verified that the postgres driver exists within the containerminiature-eve-21984
11/18/2022, 8:24 PMstraight-mouse-85445
11/21/2022, 3:10 AMfew-sunset-43876
11/21/2022, 3:27 AM03:16:05.440 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 2 Took time ms: -1
03:16:05.663 [ThreadPoolTaskExecutor-1] INFO c.l.m.k.t.DataHubUsageEventTransformer:74 - Invalid event type: SearchAcrossLineageResultsViewEvent
03:16:05.663 [ThreadPoolTaskExecutor-1] WARN c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"SearchAcrossLineageResultsViewEvent","query":"","total":10,"actorUrn":"urn:li:corpuser:datahub","timestamp":1669000565516,"date":"Mon Nov 21 2022 10:16:05 GMT+0700 (Indochina Time)","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36","browserId":"57f357cc-cdf7-4104-a7fa-30d8eda4f486"}
03:16:06.447 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ThreadPoolTaskScheduler-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | producer-1"
and the logs from datahub-frontend-react:
2022-11-21 03:17:04,148 [application-akka.actor.default-dispatcher-13] ERROR application -
! @7pkjpoecp - Internal server error, for (POST) [/api/v2/graphql] ->
play.api.UnexpectedException: Unexpected exception[CompletionException: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:340)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:263)
at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:443)
at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:441)
at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:417)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:92)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:92)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:21)
at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:18)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:67)
at scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:82)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:59)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:875)
at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:110)
at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:107)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:873)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
at scala.concurrent.Promise.complete(Promise.scala:53)
at scala.concurrent.Promise.complete$(Promise.scala:52)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
at scala.concurrent.Promise.failure(Promise.scala:104)
at scala.concurrent.Promise.failure$(Promise.scala:104)
at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:187)
at play.libs.ws.ahc.StandaloneAhcWSClient$ResponseAsyncCompletionHandler.onThrowable(StandaloneAhcWSClient.java:227)
at play.shaded.ahc.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:278)
at play.shaded.ahc.org.asynchttpclient.netty.request.NettyRequestSender.abort(NettyRequestSender.java:473)
at play.shaded.ahc.org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
at play.shaded.ahc.org.asynchttpclient.netty.timeout.ReadTimeoutTimerTask.run(ReadTimeoutTimerTask.java:56)
at play.shaded.ahc.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:670)
at play.shaded.ahc.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:745)
at play.shaded.ahc.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:473)
at play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms
... 7 common frames omitted
the stats of the containers
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
52616fa99479 datahub-frontend-react 0.59% 523.7MiB / 31.26GiB 1.64% 598kB / 619kB 0B / 0B 52
d72c1d91089c datahub_datahub-actions_1 0.06% 50.66MiB / 31.26GiB 0.16% 295MB / 181MB 5.46MB / 0B 24
805489e4533c datahub-gms 748.15% 1.754GiB / 31.26GiB 5.61% 316MB / 3.25MB 0B / 0B 127
69761ab51fcc schema-registry 0.21% 520.5MiB / 31.26GiB 1.63% 104MB / 99.2MB 6.14MB / 12.3kB 49
34814372e50d broker 0.88% 508.4MiB / 31.26GiB 1.59% 957MB / 977MB 13.3MB / 801MB 89
30a6648fdbd5 elasticsearch 0.98% 932.2MiB / 31.26GiB 2.91% 26.5MB / 27.6MB 34.1MB / 178MB 134
bbef225eadba zookeeper 0.22% 358MiB / 31.26GiB 1.12% 20MB / 12MB 451kB / 188kB 67
9a83d87163a1 mysql 0.06% 348MiB / 31.26GiB 1.09% 63.7MB / 301MB 14.9MB / 26.1MB 33
e0d367b11df2 neo4j 0.59% 1.609GiB / 31.26GiB 5.15% 17.3MB / 926MB 1.47GB / 26.1MB 78
The java heap size of datahub-gms
bash-5.1$ java -XX:+PrintFlagsFinal -version | grep HeapSize
size_t ErgoHeapSizeLimit = 0 {product} {default}
size_t HeapSizePerGCThread = 43620760 {product} {default}
size_t InitialHeapSize = 526385152 {product} {ergonomic}
size_t LargePageHeapSizeThreshold = 134217728 {product} {default}
size_t MaxHeapSize = 8392802304 {product} {ergonomic}
uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic}
uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic}
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-alpine-r3)
OpenJDK 64-Bit Server VM (build 11.0.17+8-alpine-r3, mixed mode)
datahub-gms container with free command:
docker exec -it datahub-gms bash
bash-5.1$ free
total used free shared buff/cache available
Mem: 32776400 8052724 417880 0 24305796 24294940
Swap: 4194300 3584 4190716
The application is deploy in GCP, the stats of VM:
cat /proc/meminfo
MemTotal: 32776400 kB
MemFree: 306556 kB
MemAvailable: 24412316 kB
Buffers: 2212 kB
Cached: 23913504 kB
SwapCached: 124 kB
Active: 15746384 kB
Inactive: 15120120 kB
Active(anon): 5049788 kB
Inactive(anon): 1926800 kB
Active(file): 10696596 kB
Inactive(file): 13193320 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 4191228 kB
Dirty: 84 kB
Writeback: 0 kB
AnonPages: 6950912 kB
Mapped: 309100 kB
Shmem: 25800 kB
Slab: 885396 kB
SReclaimable: 618596 kB
SUnreclaim: 266800 kB
KernelStack: 18816 kB
PageTables: 30292 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 20582500 kB
Committed_AS: 13568028 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 63820 kB
VmallocChunk: 34359661428 kB
Percpu: 5760 kB
HardwareCorrupted: 0 kB
AnonHugePages: 2617344 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 103232 kB
DirectMap2M: 5136384 kB
DirectMap1G: 30408704 kB
The production with older version V0.8.24 didn't have this OOM issue. It happens after upgrading to v0.9.2.
I upgrade the new version using command from docker-compose.yml of version v0.9.2
docker-compose down --remove-orphans && docker-compose pull && docker-compose -p datahub up --force-recreate
Is there anything I need to check or adjust (reindexing or something...)? Any help would be appreciated.