Hi All, I'm trying to setup DataHub on AWS EKS usi...
# all-things-deployment
e
Hi All, I'm trying to setup DataHub on AWS EKS using helm charts, using AWS managed services for the storage. Having issue with the
kafkaSetupJob
, getting below error
Copy code
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.admin.client for adminclient-1 unregistered
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed
[main] ERROR io.confluent.admin.utils.ClusterStatus - Error while getting broker list.
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1657820045949, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
Copy code
kafkaSetupJob:
  enabled: true
  tolerations:
    - key: "app.lucid.cloud/tolerate-app"
      operator: "Equal"
      value: "generic"
      effect: "NoSchedule"
  image:
    repository: linkedin/datahub-kafka-setup
    tag: "v0.8.40"
  serviceAccountName: datahub
passing the serviceaccount name but looks like it's ignored, when looked at the pod settings it set to default
Copy code
restartPolicy: Never
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: default
  serviceAccount: default
  nodeName: ip-*
i
Hello Vamshi, KafkaSetupJob connects to AWS via the
global.kafka.bootstrap.server
properties. See the default values.yaml as an example: https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml
e
Hi Pedro! ya I updated the values in the values.yaml file
Copy code
kafka:
    enabled: true
    bootstrap:
      server: "${datahub_kafka_endpoint}"
    zookeeper:
      server: "${datahub_zookeeper_endpoint}"
    partitions: 3
    replicationFactor: 3
    schemaregistry:
      type: AWS_GLUE
      glue:
        region: us-east-1
        registry: "${datahub_glue_registry}"
Created MSK cluster with version
3.2.0
and enabled authentication via IAM
i
Bear in mind that DataHub has not been tested with kafka v3 as far as I’m aware
👍 1
Copy code
"${datahub_kafka_endpoint}"
Is this pointing to the non-ssl broker ips?
e
yes it's pointing to a private endpoint
Copy code
<http://b-2.devdatahub.555j12.c22.kafka.us-east-1.amazonaws.com:9098|b-2.devdatahub.555j12.c22.kafka.us-east-1.amazonaws.com:9098>
will try creating new MSK cluster with v 2.6.2
how to pass serviceaccount to the
kafkaSetupJob
?
Copy code
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
	at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
	at io.confluent.admin.utils.ClusterStatus.isKafkaReady(ClusterStatus.java:149)
	at io.confluent.admin.utils.cli.KafkaReadyCommand.main(KafkaReadyCommand.java:150)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1657830804456, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited. Call: listNodes
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1657830774455, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.common.metrics.Metrics - Metrics scheduler closed
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.common.metrics.Metrics - Closing reporter org.apache.kafka.common.metrics.JmxReporter
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.common.metrics.Metrics - Metrics reporters closed
[kafka-admin-client-thread | adminclient-1] ERROR org.apache.kafka.common.utils.KafkaThread - Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1':
java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
	at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:113)
	at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:447)
	at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:397)
	at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)
	at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)
	at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:563)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1329)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1260)
	at java.lang.Thread.run(Thread.java:750)
[main] INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ...
Created MSK cluster with V2.6.2, got same error^
i
I don’t think the helm chart is prepared for service accounts
And I don’t know how service accounts work in K8s
Is your private endpoint ssl enabled? Usually non-ssl brokers are on port 9092
e
sorry I'm new to Kafka, please bare with me. I have the authentication type set to
IAM
, and encryption is enabled
when changed the endpoint to
"<http://b-2.devdatahub.kpyyyn.c22.kafka.us-east-1.amazonaws.com:9092|b-2.devdatahub.kpyyyn.c22.kafka.us-east-1.amazonaws.com:9092>"
getting below error
Copy code
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1657834223859
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1657834253864, tries=1, nextAllowedTryMs=1657834253965) timed out at 1657834253865 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed
[main] ERROR io.confluent.admin.utils.ClusterStatus - Error while getting broker list.
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1657834283865, tries=1, nextAllowedTryMs=1657834283966) timed out at 1657834283866 after 1 attempt(s)
	at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
	at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
	at io.confluent.admin.utils.ClusterStatus.isKafkaReady(ClusterStatus.java:149)
	at io.confluent.admin.utils.cli.KafkaReadyCommand.main(KafkaReadyCommand.java:150)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1657834283865, tries=1, nextAllowedTryMs=1657834283966) timed out at 1657834283866 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1657834283865, tries=1, nextAllowedTryMs=1657834283966) timed out at 1657834283866 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
[main] INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ...
[main] ERROR io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Brokers found [].
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Error while executing topic command : Call(callName=createTopics, deadlineMs=1657834349048, tries=1, nextAllowedTryMs=1657834349150) timed out at 1657834349050 after 1 attempt(s)
[2022-07-14 21:32:29,052] ERROR org.apache.kafka.common.errors.TimeoutException: Call(callName=createTopics, deadlineMs=1657834349048, tries=1, nextAllowedTryMs=1657834349150) timed out at 1657834349050 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
 (kafka.admin.TopicCommand$)
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
i
Is the kubernetes cluster in the same VPC as the kafka cluster? Can they communicate with one another?
If you launch a pod in kubernetes and try to ping kafka, are you able to?
Do you get a response?
e
ya kuberentes cluster and kafak cluster are on same vpc
Copy code
The following list provides the numbers of the ports that Amazon MSK uses to communicate with client machines.

To communicate with brokers in plaintext, use port 9092.

To communicate with brokers by using TLS encryption, use port 9094 for access from within AWS and port 9194 for public access.

To communicate with brokers by using SASL/SCRAM, use port is 9096 for access from within AWS and port 9196 for public access.

To communicate with brokers in a cluster that is set up to use IAM access control, use port 9098 for access from within AWS and port 9198 for public access.

Apache ZooKeeper nodes use port 2181 by default. To communicate with Apache ZooKeeper by using TLS encryption, use port 2182.
can't ping kafka from the pod
created MSK with no authentication, was able to connect to the cluster
Thank You for looking into it Pedro!
290 Views