DataHub #all-things-deployment

handsome-flag-16272

02/24/2023, 7:20 PM

Hi team, 1. Is datahub-mce-consumer responsible for consuming the events? 2. To improve the throughput, we need to increasing the number of Kafka partitions and scaling out the datahub-mce-consumer instance. - Is that all messages send to one topic or we can config diffient Kafka topics per domain? - Does datahub support scaleing out consumer based on the number of to be processed messages in the Kafka topic? 3. If no standalone datahub-mce-consumer instances, which component will do the job? From log I found that’s datahub-gms. • If we start both mce-consumer, mae-consumer and datahub-gms. Will one change proposal event will be processed in all these 3 components? 4. Where will datahub-mce-consumer send data to, datahub-gms or directly to ElasticSearch for index buildibng? It appears to the datahub-gms.

☝️ 1

numerous-byte-87938

02/24/2023, 9:44 PM

🤔 Looking for some insights on ES indices reindex on GMS deployment (Our version is before this PR). My current understanding is that GMS uses Spring, and before creating servlets, it needs to initialize beans first, such as elasticSearchService, elasticSearchGraphService. One common method of those services is

configure()

which rebuilds the indices as needed. What confuses me is that I’m only able to find references to this method in mae-consumer and nowhere else on GMS side. But we do see this

configure()

method being called (reindex step happening) inside our GMS pod deployment, despite the fact that we’ve separated mae-consumer into another service.

future-analyst-98466

02/27/2023, 7:19 AM

Hi team, Is there any sizing guidelines for deployment datahub on docker? What parameters does sizing depend on? (example number of tables, how big schemas...etc)

✅ 1

agreeable-belgium-70840

02/27/2023, 12:00 PM

So I posted yesterday on #troubleshoot about that, but the issue persists. Sorry for the spam, but I'll need to post once more and write some further details. I am trying to upgrade datahub to v0.10.0 from v0.9.5 . I ran all the init-jobs and the datahub-upgrade. Everything gets deployed, however datahub-gms seems to be hanging without giving any error. Helm times out at some point and the deployment fails. Here are datahub's logs:

Copy code

2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer:292 - mce-consumer-job-client: partitions revoked: []
2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:552 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] (Re-)joining group
2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:552 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] (Re-)joining group
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:503 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Successfully joined group with generation 1837
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:503 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Successfully joined group with generation 1837
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.ConsumerCoordinator:273 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Adding newly assigned partitions: 
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.ConsumerCoordinator:273 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Adding newly assigned partitions: 
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer:292 - mce-consumer-job-client: partitions assigned: []

Any recommendation is welcome. Regards

gifted-diamond-19544

02/27/2023, 12:17 PM

Hello all! I have a question, how is the weekly active users on the analytics tab calculated? The reason I ask is because the number shown at the top of the page is not the same as the one shown in the graph for this week

wooden-breakfast-17692

02/27/2023, 3:22 PM

Hi all, I’m trying to deploy datahub locally with

./gradlew quickstartDebug

using neo4j. It looks like neo4j is disabled by default. Is there a way I can use quickstartDebug with neo4j? Thanks!

bland-balloon-48379

02/27/2023, 4:06 PM

Hey everyone! So my team recently tested switching from neo4j to elasticsearch as our graph database backend. We followed the documentation to do the switch and things seemed to go pretty smoothly, however the logs for the restore indices job show that there were 521 skipped rows during re-indexing. Looking through the logs more in-depth, all of the failures were of the same format

java.lang.IllegalArgumentException: Failed to find entity with name X in EntityRegistry.

Aggregating these, I got the following count of entity names: globalSettings: 1 dataHubView: 8 dataHubStepState: 512 I just wanted to know if these are all internal datahub items that don't have a place in ES and can be safely ignored or if I need to investigate these items further. For context, we are still running datahub v0.9.5. Any insights around this would be appreciated. Thanks!

plus1 2

alert-traffic-45034

02/28/2023, 5:20 AM

Hi, Hope this is the right place where I throw the below question. Given that I onboard the users with okta sso login, So, the user will be created on datahub only at the time when she/he does the firsts log-in. What I would like to do is that I would like to have a specified grouping of users using the provided API, even when the user has not been created yet.

Copy code

mutation {
  addGroupMembers(input:{
    groupUrn:"<existing-group-Urn>"
    userUrns:[
      "urn:li:corpuser:<new-user-1>"
      "urn:li:corpuser:<new-user-2>"
    ]
  })
}

But currently, It is failed to do so, even I have the exact string pattern of the new user(s). May I know any other good approach to doing it? thanks in advance

✅ 1

cuddly-arm-8412

02/28/2023, 7:23 AM

hi,team.I modified the gms startup port. At that time, the run gms service prompted the following error->Connection refused: localhost/127.0.0.1:14142 I don't know why the service didn't start successfully？

Copy code

task run(type: JavaExec, dependsOn: build) {
    main = "org.eclipse.jetty.runner.Runner"
    systemProperties System.getProperties()
    args = ["--port", 14142, war.archivePath]
    classpath configurations.jetty9
}

2023-02-28T150710.822+0800 [QUIET] [system.out] 2023-02-28 150710,822 [R2 Nio Event Loop-1-1] DEBUG c.l.r.t.http.client.AsyncPoolImpl:733 - localhost/127.0.0.114142/fb07564 object creation failed 2023-02-28T150710.822+0800 [QUIET] [system.out] com.linkedin.r2.RetriableRequestException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:14142 2023-02-28T150710.822+0800 [QUIET] [system.out] at com.linkedin.r2.transport.http.client.common.ChannelPoolLifecycle.onError(ChannelPoolLifecycle.java:142) 2023-02-28T150710.822+0800 [QUIET] [system.out] at com.linkedin.r2.transport.http.client.common.ChannelPoolLifecycle.lambda$create$0(ChannelPoolLifecycle.java:97) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) 2023-02-28T150710.822+0800 [QUIET] [system.out] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 2023-02-28T150710.822+0800 [QUIET] [system.out] at java.base/java.lang.Thread.run(Thread.java:834) 2023-02-28T150710.822+0800 [QUIET] [system.out] Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:14142 2023-02-28T150710.822+0800 [QUIET] [system.out] Caused by: java.net.ConnectException: Connection refused 2023-02-28T150710.822+0800 [QUIET] [system.out] at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 2023-02-28T150710.823+0800 [QUIET] [system.out] at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) 2023-02-28T150710.823+0800 [QUIET] [system.out] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 2023-02-28T150710.823+0800 [QUIET] [system.out] at java.base/java.lang.Thread.run(Thread.java:834) 2023-02-28T150712.721+0800 [QUIET] [system.out] 2023-02-28 150712,721 [ThreadPoolTaskExecutor-1] DEBUG o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:313 - Received: 0 records

✅ 1

bright-receptionist-94235

02/28/2023, 7:53 AM

Hey, we have upgraded to version 0.0.11 Image: docker.taboolasyndication.com/data-apps/acryldata/datahub-actions:v0.0.11 but inside the actions pod datahub cli version is DataHub CLI version: 0.9.6.2 why is it?

cuddly-arm-8412

02/28/2023, 11:34 AM

hi,team。I merged the latest official code. When I debugged the code to get the dataset information, an error was reported。Is the image database to be upgraded？ Caused by: com.google.common.util.concurrent.UncheckedExecutionException: org.neo4j.driver.exceptions.ClientException: The server does not support any of the protocol versions supported by this driver. Ensure that you are using driver and server versions that are compatible with one another.

✅ 1

gifted-diamond-19544

02/28/2023, 12:38 PM

Hello all! We are currently having a problem with our Datahub instance. Basically, we cannot trigger an ingestion on our UI, or add new ingestion sources. When we trigger an ingestion, there is a green a popup that says that the ingestion started, but there is no change in the UI. When I go into the logs, there is this error message:

Copy code

[0]: index [datahubexecutionrequestindex_v2], type [_doc], id [urn%3Ali%3AdataHubExecutionRequest%3Acb0e3a90-c4b5-47de-9f60-88c9301d7866], message [[datahubexecutionrequestindex_v2/hcm4-xQ0T2CYNwTo9WLH4Q][[datahubexecutionrequestindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubExecutionRequest%3Acb0e3a90-c4b5-47de-9f60-88c9301d7866]: document missing]]]

what would be the best way to fix this? Thank you!

busy-mechanic-8014

02/28/2023, 2:53 PM

Hello everyone, I'm trying to deploy datahub on my Kubernetes cluster and i am stuck with datahub-gms pod error. I have already seen that the problem was encountered by others but the solutions that were proposed did not work on my side. One of the many mistakes:

Copy code

Caused by: 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'restliEntityClientFactory': Unsatisfied dependency expressed through field 'gmsPort'; nested exception is org.springframework.beans.TypeMismatchException: Failed to convert value of type 'java.lang.String' to required type 'int'; nested exception is java.lang.NumberFormatException: For input string: "<tcp://10.43.117.128:8080>"

See the complete log file (interesting lines from line 2824) (datahub-gms-...log). I deploy from the helm charts like this (without modifying the values.yaml):

Copy code

helm repo add datahub <https://helm.datahubproject.io/>
helm install prerequisites datahub/datahub-prerequisites --namespace datahub
helm install datahub datahub/datahub --namespace datahub

All components are in the Running state (except acryl-datahub and datahub-gms), all Jobs are in the Succeeded state and have no error logs (especially mysql). I attached to this message all log files from Job. Can someone help me ? Thanks a lot ! Don't hesitate to ask me for more information if needed 🙂

datahub-system-update-job-z25bm.log kafka-setup-job-6bqzz.log elasticsearch-setup-job-smlp2.log mysql-setup-job-vq48j.log

datahub-gms-84d748899c-nn7c5.log

agreeable-belgium-70840

02/28/2023, 3:43 PM

Hello guys, I have posted about that before but I have some more evidence now, sorry for the spam. I am trying to upgrade to v0.10.0 from v0.9.5 . I ran the upgrade-job and the elasticsearch, kafka and mysql init jobs. Everything looks fine, but gms hangs here:

Copy code

2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:552 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] (Re-)joining group
2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:552 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] (Re-)joining group
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:503 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Successfully joined group with generation 1837
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator:503 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Successfully joined group with generation 1837
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.ConsumerCoordinator:273 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Adding newly assigned partitions: 
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.ConsumerCoordinator:273 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Adding newly assigned partitions: 
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer:292 - mce-consumer-job-client: partitions assigned: []

At some point helm times out and it fails. What I can observe now that it is extra warning, which is:

Copy code

2023-02-24 14:42:10,554 [main] WARN  c.l.metadata.entity.EntityService:798 - Unable to produce legacy MAE, entity may not have legacy Snapshot schema.
java.lang.UnsupportedOperationException: Failed to find Typeref schema associated with Config-based Entity
	at com.linkedin.metadata.models.ConfigEntitySpec.getAspectTyperefSchema(ConfigEntitySpec.java:80)
	at com.linkedin.metadata.entity.EntityService.toAspectUnion(EntityService.java:1480)
	at com.linkedin.metadata.entity.EntityService.buildSnapshot(EntityService.java:1429)
	at com.linkedin.metadata.entity.EntityService.produceMetadataAuditEvent(EntityService.java:1239)
	at com.linkedin.metadata.entity.EntityService.sendEventForUpdateAspectResult(EntityService.java:794)

The full log is attached. I would be grateful if I get some assistance, as I am stuck at that point. Regards

logs-from-datahub-gms-in-datahub-gms-566ddf4574-rxmk7.log

cuddly-arm-8412

03/01/2023, 7:01 AM

hi,team.I want to know the effects of this index[datahubstepstateindex_v2]. I pull the latest code to debug locally and search for this error

Copy code

Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://172.21.204.35:9201>], URI [/datahubstepstateindex_v2/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true], status line [HTTP/1.1 404 Not Found]
2023-03-01T13:35:55.598+0800 [QUIET] [system.out] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahubstepstateindex_v2]","resource.type":"index_or_alias","resource.id":"datahubstepstateindex_v2","index_uuid":"_na_","index":"datahubstepstateindex_v2"}],"type":"index_not_found_exception","reason":"no such index [datahubstepstateindex_v2]","resource.type":"index_or_alias","resource.id":"datahubstepstateindex_v2","index_uuid":"_na_","index":"datahubstepstateindex_v2"},"status":404}

✅ 1

better-sunset-65466

03/01/2023, 10:07 AM

Hello, what are the minimum/recommended system requirements to install via

docker quickstart

✅ 1

billowy-jewelry-47039

03/01/2023, 10:44 AM

After installing the datahub helm chart I noticed the default datahub user doesn't have administrator privileges. Is there any way to create a user with administrator privileges?

rapid-airport-61849

03/01/2023, 11:13 AM

How could we add lib into docker Quickstart image? PipelineInitError: Failed to configure the source (mssql): No module named ‘pyodbc’

best-wire-59738

03/01/2023, 1:11 PM

Hello Team, we are facing kafka rebalancing issue when we set KAFKA_LISTENER_CONCURRENCY to 10 and Increased partitions to 10 to make consumers work in parallel as we are using kafka sink and ingested data is taking long time to get updated neo4j. we confirmed we get into group rebalancing issue when concurrency is set to any number instead of 1. we have reduced max.poll.records to 10 (default 500) thinking it might resolve the issue. But no use. Could you please help us. we are currently on datahub version 0.9.2 and using MSK service from aws

Copy code

19:43:59 [ThreadPoolTaskExecutor-2] INFO  o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-9, groupId=generic-mae-consumer-job-client] Finished assignment for group at generation 5728: {consumer-generic-mae-consumer-job-client-9-fa499c12-3ac0-440b-aa27-711c3b60c14d=Assignment(partitions=[MetadataChangeLog_Timeseries_v1-5, MetadataChangeLog_Timeseries_v1-6, MetadataChangeLog_Timeseries_v1-7, MetadataChangeLog_Timeseries_v1-8, MetadataChangeLog_Timeseries_v1-9, MetadataChangeLog_Versioned_v1-5, MetadataChangeLog_Versioned_v1-6, MetadataChangeLog_Versioned_v1-7, MetadataChangeLog_Versioned_v1-8, MetadataChangeLog_Versioned_v1-9]), consumer-generic-mae-consumer-job-client-10-a8633ab0-830e-4e75-9d1c-593e100e1505=Assignment(partitions=[MetadataChangeLog_Timeseries_v1-0, MetadataChangeLog_Timeseries_v1-1, MetadataChangeLog_Timeseries_v1-2, MetadataChangeLog_Timeseries_v1-3, MetadataChangeLog_Timeseries_v1-4, MetadataChangeLog_Versioned_v1-0, MetadataChangeLog_Versioned_v1-1, MetadataChangeLog_Versioned_v1-2, MetadataChangeLog_Versioned_v1-3, MetadataChangeLog_Versioned_v1-4])}
19:43:59 [ThreadPoolTaskExecutor-3] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-10, groupId=generic-mae-consumer-job-client] Successfully joined group with generation 5728
19:43:59 [ThreadPoolTaskExecutor-2] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-9, groupId=generic-mae-consumer-job-client] Successfully joined group with generation 5728
19:47:45 [ThreadPoolTaskExecutor-1] WARN  o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-8, groupId=generic-mae-consumer-job-client] Synchronous auto-commit of offsets {MetadataChangeLog_Timeseries_v1-1=OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}, MetadataChangeLog_Versioned_v1-4=OffsetAndMetadata{offset=8, leaderEpoch=0, metadata=''}, MetadataChangeLog_Timeseries_v1-0=OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}, MetadataChangeLog_Versioned_v1-3=OffsetAndMetadata{offset=1, leaderEpoch=0, metadata=''}, MetadataChangeLog_Timeseries_v1-3=OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}, MetadataChangeLog_Versioned_v1-2=OffsetAndMetadata{offset=168196, leaderEpoch=0, metadata=''}, MetadataChangeLog_Timeseries_v1-2=OffsetAndMetadata{offset=3, leaderEpoch=null, metadata=''}, MetadataChangeLog_Versioned_v1-1=OffsetAndMetadata{offset=189482, leaderEpoch=0, metadata=''}, MetadataChangeLog_Versioned_v1-0=OffsetAndMetadata{offset=168795, leaderEpoch=0, metadata=''}, MetadataChangeLog_Timeseries_v1-4=OffsetAndMetadata{offset=0, leaderEpoch=null, metadata=''}} failed: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
19:47:45 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-8, groupId=generic-mae-consumer-job-client] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
19:47:45 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-8, groupId=generic-mae-consumer-job-client] Lost previously assigned partitions MetadataChangeLog_Timeseries_v1-1, MetadataChangeLog_Versioned_v1-4, MetadataChangeLog_Timeseries_v1-0, MetadataChangeLog_Versioned_v1-3, MetadataChangeLog_Timeseries_v1-3, MetadataChangeLog_Versioned_v1-2, MetadataChangeLog_Timeseries_v1-2, MetadataChangeLog_Versioned_v1-1, MetadataChangeLog_Versioned_v1-0, MetadataChangeLog_Timeseries_v1-4
19:47:45 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer - generic-mae-consumer-job-client: partitions lost: [MetadataChangeLog_Timeseries_v1-1, MetadataChangeLog_Versioned_v1-4, MetadataChangeLog_Timeseries_v1-0, MetadataChangeLog_Versioned_v1-3, MetadataChangeLog_Timeseries_v1-3, MetadataChangeLog_Versioned_v1-2, MetadataChangeLog_Timeseries_v1-2, MetadataChangeLog_Versioned_v1-1, MetadataChangeLog_Versioned_v1-0, MetadataChangeLog_Timeseries_v1-4]
19:47:45 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer - generic-mae-consumer-job-client: partitions revoked: [MetadataChangeLog_Timeseries_v1-1, MetadataChangeLog_Versioned_v1-4, MetadataChangeLog_Timeseries_v1-0, MetadataChangeLog_Versioned_v1-3, MetadataChangeLog_Timeseries_v1-3, MetadataChangeLog_Versioned_v1-2, MetadataChangeLog_Timeseries_v1-2, MetadataChangeLog_Versioned_v1-1, MetadataChangeLog_Versioned_v1-0, MetadataChangeLog_Timeseries_v1-4]
19:47:45 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-8, groupId=generic-mae-consumer-job-client] (Re-)joining group
19:47:45 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-8, groupId=generic-mae-consumer-job-client] Join group failed with org.apache.kafka.common.errors.MemberIdRequiredException: The group member needs to have a valid member id before actually entering a consumer group
19:47:45 [ThreadPoolTaskExecutor-1] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-8, groupId=generic-mae-consumer-job-client] (Re-)joining group
19:47:47 [ThreadPoolTaskExecutor-2] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-9, groupId=generic-mae-consumer-job-client] Attempt to heartbeat failed since group is rebalancing
19:47:47 [ThreadPoolTaskExecutor-2] INFO  o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-9, groupId=generic-mae-consumer-job-client] Revoke previously assigned partitions MetadataChangeLog_Timeseries_v1-7, MetadataChangeLog_Timeseries_v1-6, MetadataChangeLog_Timeseries_v1-9, MetadataChangeLog_Timeseries_v1-8, MetadataChangeLog_Versioned_v1-9, MetadataChangeLog_Versioned_v1-8, MetadataChangeLog_Versioned_v1-7, MetadataChangeLog_Versioned_v1-6, MetadataChangeLog_Versioned_v1-5, MetadataChangeLog_Timeseries_v1-5
19:47:47 [ThreadPoolTaskExecutor-2] INFO  o.s.k.l.KafkaMessageListenerContainer - generic-mae-consumer-job-client: partitions revoked: [MetadataChangeLog_Timeseries_v1-7, MetadataChangeLog_Timeseries_v1-6, MetadataChangeLog_Timeseries_v1-9, MetadataChangeLog_Timeseries_v1-8, MetadataChangeLog_Versioned_v1-9, MetadataChangeLog_Versioned_v1-8, MetadataChangeLog_Versioned_v1-7, MetadataChangeLog_Versioned_v1-6, MetadataChangeLog_Versioned_v1-5, MetadataChangeLog_Timeseries_v1-5]
19:47:47 [ThreadPoolTaskExecutor-2] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-9, groupId=generic-mae-consumer-job-client] (Re-)joining group
19:47:47 [kafka-coordinator-heartbeat-thread | generic-mae-consumer-job-client] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-10, groupId=generic-mae-consumer-job-client] Attempt to heartbeat failed since group is rebalancing
19:47:50 [kafka-coordinator-heartbeat-thread | generic-mae-consumer-job-client] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-10, groupId=generic-mae-consumer-job-client] Attempt to heartbeat failed since group is rebalancing
19:47:56 [kafka-coordinator-heartbeat-thread | generic-mae-consumer-job-client] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-10, groupId=generic-mae-consumer-job-client] Attempt to heartbeat failed since group is rebalancing
19:48:59 [kafka-coordinator-heartbeat-thread | generic-mae-consumer-job-client] INFO  o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-10, groupId=generic-mae-consumer-job-client] Member consumer-generic-mae-consumer-job-client-10-a8633ab0-830e-4e75-9d1c-593e100e1505 sending LeaveGroup request to coordinator <http://b-1.datahubmsk.6wyvlb.c8.kafka.us-west-2.amazonaws.com:9092|b-1.datahubmsk.6wyvlb.c8.kafka.us-west-2.amazonaws.com:9092> (id: 2147483646 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured <http://max.poll.interval.ms|max.poll.interval.ms>, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing <http://max.poll.interval.ms|max.poll.interval.ms> or by reducing the maximum size of batches returned in poll() with max.poll.records.

✅ 1

creamy-machine-95935

03/01/2023, 9:41 PM

Hi everyone! Is there any example of how to deploy datahub on kubernetes using Terraform? thanks! 🙌

lookaround 2

plus1 1

witty-toddler-69828

03/02/2023, 9:40 AM

Hello everyone, I'm looking to deploy datahub to AWS using ECS and the Opensearch / Elasticsearch service. I'm finding that the GMS container is starting up successfully and running, but it doesn't seem to be listening on port 8080. Checking the GMS logs, I'm getting lots of Elasticsearch errors and wondering if that may be the issue. This is one of them:

Copy code

{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index [datahubpolicyindex_v2]",
        "resource.type": "index_or_alias",
        "resource.id": "datahubpolicyindex_v2",
        "index_uuid": "_na_",
        "index": "datahubpolicyindex_v2"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index [datahubpolicyindex_v2]",
    "resource.type": "index_or_alias",
    "resource.id": "datahubpolicyindex_v2",
    "index_uuid": "_na_",
    "index": "datahubpolicyindex_v2"
  },
  "status": 404
}

I've checked and that index mentioned doesn't exist. I ran the datahub-elasticsearch-setup task which seems to run successfully, but it doesn't create the index that the GMS task is expecting. The indexes looks like:

Copy code

health status index                          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_1                      -ePAdnrdRzii8blPjg6iqA   1   0          1            0        5kb            5kb
yellow open   .opendistro-job-scheduler-lock 4Uf-Vr2jTAi6jm5AAuP_5g   5   1          1            0     12.1kb         12.1kb
yellow open   datahub_usage_event-000001     8JEBpkYDQX6z5XVZXvYFuQ   5   1          0            0        1kb            1kb

Does anyone know why the index that GMS is looking for doesn't match what is created? Is it right that the http service wouldn't be available if the index isn't there or should I be looking elsewhere for the issue?

✅ 2

aloof-dentist-85908

03/02/2023, 9:58 AM

Hi, does anyone know when the confluent schema registry will be removed as hard dependency? Is there any plan when this will be released? https://github.com/datahub-project/datahub/pull/6552 @incalculable-ocean-74010 Do you have any news for us? 🙂 Thanks a lot!

✅ 1

microscopic-machine-90437

03/02/2023, 12:06 PM

Hi Team, I'm trying to deploy datahub using Kubernetes, for which I need to install minikube. When I try to install minikube on my LINUX server, I'm getting the below error. Can someone help...!

✅ 1

silly-angle-91497

03/02/2023, 5:00 PM

Hello everyone. When we are using AWS MSK with IAM authentication, how are we supposed to configure this in the prerequisites values.yaml file?

famous-fall-59477

03/02/2023, 6:42 PM

Hi, when I try to do a

./gradlew build

, I get the following yarn related error:

Copy code

> Task :datahub-web-react:yarnGenerate FAILED
yarn run v1.22.0
$ graphql-codegen --config codegen.yml
node:internal/modules/cjs/loader:936
  throw err;
  ^

Error: Cannot find module './_baseClone'
Require stack:
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/lodash/clone.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/builder.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/generated/index.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/utils/react/cleanJSXElementLiteralChild.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/react/buildChildren.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/index.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/index.cjs.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/code-file-loader/index.cjs.js
- /Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-codegen/cli/bin.js
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)
    at Function.Module._load (node:internal/modules/cjs/loader:778:27)
    at Module.require (node:internal/modules/cjs/loader:1005:19)
    at require (node:internal/modules/cjs/helpers:94:18)
    at Object.<anonymous> (/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/lodash/clone.js:1:17)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Module.require (node:internal/modules/cjs/loader:1005:19) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/lodash/clone.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/builder.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/generated/index.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/utils/react/cleanJSXElementLiteralChild.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/react/buildChildren.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/index.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/index.cjs.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-tools/code-file-loader/index.cjs.js',
    '/Users/subhajoy/Documents/personal_repositories/datahub/datahub-web-react/node_modules/@graphql-codegen/cli/bin.js'
  ]
}
error Command failed with exit code 1.
info Visit <https://yarnpkg.com/en/docs/cli/run> for documentation about this command.

I am on a Mac (non M1), and

java --version

gives:

Copy code

openjdk 11.0.18 2023-01-17
OpenJDK Runtime Environment Homebrew (build 11.0.18+0)
OpenJDK 64-Bit Server VM Homebrew (build 11.0.18+0, mixed mode)

Any idea why is this happening? I noticed several posts here about the same issue somewhat, but I could not find any meaningful resolution yet. Any help would be appreciated, thank you!

lookaround 3

cuddly-arm-8412

03/03/2023, 5:38 AM

hi,team. i download the project loacally,when i run command -> python3 -m datahub docker quickstart --quickstart-compose-file /project/github-datahub/docker/docker-compose.yml Error response from daemon: manifest for elasticsearch:7.10.2 not found: manifest unknown: manifest unknown

shy-dog-84302

03/03/2023, 2:24 PM

Hi! I’m looking for some advice on stackdriver log integration of DataHub components in GCP. Is there any way I can configure logging for various components like metadata service and front-end etc?

rapid-spoon-75609

03/03/2023, 4:45 PM

Hello! Is there a way to run custom actions from the DataHub helm chart? I see that a container is running when I deploy, but there is no documentation which describes how it works:

Copy code

datahub-acryl-datahub-actions-58b676f77c-c6pfx

What is the purpose of this subchart? It’s enabled by default but I can’t find info on how to use it: https://github.com/acryldata/datahub-helm/tree/master/charts/datahub/subcharts/acryl-datahub-actions Thanks!

white-horse-97256

03/03/2023, 6:22 PM

Hi Team, We are trying to deploy a new datahub instance using helm charts approach, we see there is schema registry url in the values.yaml file of type kafka

Copy code

schemaregistry:
  url: "<http://prerequisites-cp-schema-registry:8081>"
  type: KAFKA

What do you mean type: KAFKA , we are trying to use our own kafka servers and wanted to check what schema registry url should we give? How to we provide creds required to authenticate our servers?

white-horse-97256

03/03/2023, 8:40 PM

Hi Team, I am also facing issue connecting to ES server , our ES server is https and they have self-signed certificates, how to configure those in values.yaml file? for ES in helm charts