<@U031JK100BD> Let's chat more here!
# all-things-deployment
b
@bitter-dog-24903 Let's chat more here!
Please paste the command + the failure you're seeing
b
Thanks John..
So I am deploying Datahub on AWS using EKS
And I want to use AWS RDS as a dependency for MySql db that Datahub requires..
Below are the list of steps I performed:
• Created a cluster using eks:
eksctl create cluster \
--name datahub \
--region us-east-1 \
--with-oidc \
--nodes=3
This created a VPC and deployed kubernetes clusters in AWS
I created an RDS instance with the same VPC and included all the security groups that were part of the datahub VPC
Changed the values.yaml for prerequisites chart for mysql to false:
Copy code
mysql:
  enabled: false
  auth:
    # For better security, add mysql-secrets k8s secret with mysql-root-password, mysql-replication-password and mysql-password
    existingSecret: mysql-secrets
b
Okay makes sense
And did you configure the values to talk to your new RDS?
b
Deployed the prerequisites using that values.yaml file
Also changed the values.yaml file for datahub deployment to point it to the new RDS instance
After that deployed datahub using helm install
datahub datahub/datahub --values values.yaml --debug
Getting below error:
Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition
helm.go:84: [debug] failed pre-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
<http://helm.sh/helm/v3/cmd/helm/install.go:127|helm.sh/helm/v3/cmd/helm/install.go:127>
<http://github.com/spf13/cobra.(*Command).execute|github.com/spf13/cobra.(*Command).execute>
<http://github.com/spf13/cobra@v1.3.0/command.go:856|github.com/spf13/cobra@v1.3.0/command.go:856>
<http://github.com/spf13/cobra.(*Command).ExecuteC|github.com/spf13/cobra.(*Command).ExecuteC>
<http://github.com/spf13/cobra@v1.3.0/command.go:974|github.com/spf13/cobra@v1.3.0/command.go:974>
<http://github.com/spf13/cobra.(*Command).Execute|github.com/spf13/cobra.(*Command).Execute>
<http://github.com/spf13/cobra@v1.3.0/command.go:902|github.com/spf13/cobra@v1.3.0/command.go:902>
main.main
<http://helm.sh/helm/v3/cmd/helm/helm.go:83|helm.sh/helm/v3/cmd/helm/helm.go:83>
runtime.main
runtime/proc.go:255
runtime.goexit
runtime/asm_amd64.s:1581
@kind-dawn-17532 ^
b
Okay got it - and what version of datahub are you deploying? (Which chart version)
b
datahub-0.2.81, datahub-prerequisites-0.0.6
b
ok thank you! cc @early-lamp-41924 @kind-dawn-17532
e
Could you run
Copy code
kubectl get pods -n <<namespace>>
one of the setup jobs must be failing
b
The mysql setup job is failing
Copy code
NAME                        READY  STATUS            RESTARTS  AGE
datahub-elasticsearch-setup-job-x98wk       0/1   Completed          0     52m
datahub-kafka-setup-job-48fzj           0/1   Completed          0     52m
datahub-mysql-setup-job-2t92q           0/1   Error            0     51m
datahub-mysql-setup-job-5swhg           0/1   Error            0     49m
datahub-mysql-setup-job-b8ln6           0/1   Error            0     46m
datahub-mysql-setup-job-bdn4d           0/1   Error            0     51m
datahub-mysql-setup-job-dk7zp           0/1   Error            0     51m
datahub-mysql-setup-job-n7mb6           0/1   Error            0     51m
datahub-mysql-setup-job-zb5p9           0/1   Error            0     50m
elasticsearch-master-0               1/1   Running           0     68m
elasticsearch-master-1               1/1   Running           0     68m
elasticsearch-master-2               1/1   Running           0     68m
prerequisites-cp-schema-registry-cf79bfccf-6t7nq  2/2   Running           0     68m
prerequisites-kafka-0               1/1   Running           0     68m
prerequisites-neo4j-community-0          0/1   CreateContainerConfigError  0     68m
prerequisites-zookeeper-0             1/1   Running           0     68m
e
Can you post the logs?
b
You mean the kubernetes logs?
Copy code
helm install datahub datahub/datahub --values values.yaml --debug
install.go:178: [debug] Original chart version: ""
install.go:195: [debug] CHART PATH: /Users/ronakshah/Library/Caches/helm/repository/datahub-0.2.81.tgz

client.go:299: [debug] Starting delete for "datahub-elasticsearch-setup-job" Job
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job datahub-elasticsearch-setup-job with timeout of 5m0s
client.go:557: [debug] Add/Modify event for datahub-elasticsearch-setup-job: ADDED
client.go:596: [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED
client.go:299: [debug] Starting delete for "datahub-kafka-setup-job" Job
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job datahub-kafka-setup-job with timeout of 5m0s
client.go:557: [debug] Add/Modify event for datahub-kafka-setup-job: ADDED
client.go:596: [debug] datahub-kafka-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for datahub-kafka-setup-job: MODIFIED
client.go:299: [debug] Starting delete for "datahub-mysql-setup-job" Job
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job datahub-mysql-setup-job with timeout of 5m0s
client.go:557: [debug] Add/Modify event for datahub-mysql-setup-job: ADDED
client.go:596: [debug] datahub-mysql-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for datahub-mysql-setup-job: MODIFIED
client.go:596: [debug] datahub-mysql-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for datahub-mysql-setup-job: MODIFIED
client.go:596: [debug] datahub-mysql-setup-job: Jobs active: 1, jobs failed: 2, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for datahub-mysql-setup-job: MODIFIED
client.go:596: [debug] datahub-mysql-setup-job: Jobs active: 1, jobs failed: 3, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for datahub-mysql-setup-job: MODIFIED
client.go:596: [debug] datahub-mysql-setup-job: Jobs active: 1, jobs failed: 4, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for datahub-mysql-setup-job: MODIFIED
client.go:596: [debug] datahub-mysql-setup-job: Jobs active: 1, jobs failed: 5, jobs succeeded: 0
Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition
helm.go:84: [debug] failed pre-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
	<http://helm.sh/helm/v3/cmd/helm/install.go:127|helm.sh/helm/v3/cmd/helm/install.go:127>
<http://github.com/spf13/cobra.(*Command).execute|github.com/spf13/cobra.(*Command).execute>
	<http://github.com/spf13/cobra@v1.3.0/command.go:856|github.com/spf13/cobra@v1.3.0/command.go:856>
<http://github.com/spf13/cobra.(*Command).ExecuteC|github.com/spf13/cobra.(*Command).ExecuteC>
	<http://github.com/spf13/cobra@v1.3.0/command.go:974|github.com/spf13/cobra@v1.3.0/command.go:974>
<http://github.com/spf13/cobra.(*Command).Execute|github.com/spf13/cobra.(*Command).Execute>
	<http://github.com/spf13/cobra@v1.3.0/command.go:902|github.com/spf13/cobra@v1.3.0/command.go:902>
main.main
	<http://helm.sh/helm/v3/cmd/helm/helm.go:83|helm.sh/helm/v3/cmd/helm/helm.go:83>
runtime.main
	runtime/proc.go:255
runtime.goexit
	runtime/asm_amd64.s:1581
b
Failed on pre-install ...
So mysql-setup configurations must not be correct- you mentioned that the RDS is in the same VPC - Is it in the same subnet?
b
Yes, it is in the same subnet as the datahub
e
Yes the kubernetes logs
The logs of the failed mysql-setup pods
b
@early-lamp-41924 Posted the logs while deployment above^
e
Those are logs from helm?
You can get it via
Copy code
kubectl logs datahub-mysql-setup-job-2t92q -n <<namespace>
Copy code
kubectl logs <<pod-name>> -n <<namespace>
Just copied one of the mysql setup job pod name from above
b
Ohh got it.. getting it. Thanks
Copy code
2022/06/14 17:46:54 Waiting for: <tcp://datahub.cluster-cevhj.us-east-1.rds.amazonaws.com:3306>
2022/06/14 17:46:54 Connected to <tcp://datahub.cluster-cevhj.us-east-1.rds.amazonaws.com:3306>
-- create datahub database
CREATE DATABASE IF NOT EXISTS datahub CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
USE datahub;

-- create metadata aspect table
create table if not exists metadata_aspect_v2 (
 urn              varchar(500) not null,
 aspect            varchar(200) not null,
 version            bigint(20) not null,
 metadata           longtext not null,
 systemmetadata        longtext,
 createdon           datetime(6) not null,
 createdby           varchar(255) not null,
 createdfor          varchar(255),
 constraint pk_metadata_aspect_v2 primary key (urn,aspect,version)
);

-- create default records for datahub user if not exists
CREATE TABLE temp_metadata_aspect_v2 LIKE metadata_aspect_v2;
INSERT INTO temp_metadata_aspect_v2 (urn, aspect, version, metadata, createdon, createdby) VALUES(
 'urn:li:corpuser:datahub',
 'corpUserInfo',
 0,
 '{"displayName":"Data Hub","active":true,"fullName":"Data Hub","email":"<mailto:datahub@linkedin.com|datahub@linkedin.com>"}',
 now(),
 'urn:li:corpuser:__datahub_system'
), (
 'urn:li:corpuser:datahub',
 'corpUserEditableInfo',
 0,
 '{"skills":[],"teams":[],"pictureLink":"<https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web-react/src/images/default_avatar.png>"}',
 now(),
 'urn:li:corpuser:__datahub_system'
);
-- only add default records if metadata_aspect is empty
INSERT INTO metadata_aspect_v2
SELECT * FROM temp_metadata_aspect_v2
WHERE NOT EXISTS (SELECT * from metadata_aspect_v2);
DROP TABLE temp_metadata_aspect_v2;

-- create metadata index table
CREATE TABLE IF NOT EXISTS metadata_index (
 `id` BIGINT NOT NULL AUTO_INCREMENT,
 `urn` VARCHAR(200) NOT NULL,
 `aspect` VARCHAR(150) NOT NULL,
 `path` VARCHAR(150) NOT NULL,
 `longVal` BIGINT,
 `stringVal` VARCHAR(200),
 `doubleVal` DOUBLE,
 CONSTRAINT id_pk PRIMARY KEY (id),
 INDEX longIndex (`urn`,`aspect`,`path`,`longVal`),
 INDEX stringIndex (`urn`,`aspect`,`path`,`stringVal`),
 INDEX doubleIndex (`urn`,`aspect`,`path`,`doubleVal`)
);
ERROR 1045 (28000): Access denied for user 'root'@'192.168.60.40' (using password: YES)
2022/06/14 17:46:54 Command exited with error: exit status 1
e
Access denied?
b
Looks like its able to connect to the database, but does not have root access?
I am setting up the access password using
kubectl create secret generic mysql-secrets --from-literal=mysql-root-password=<<password>>
e
oh one thing
are you using RDS?
b
Yes, I am using RDS
e
there the username should be “admin”
not root
b
is this an RDS thing?
e
at least that is the case for our dbs. can you check?
b
I am having username as datahub in RDS
Should I change it to root?
e
are you setting that anywhere?
we default username to root
you should change it to datahub
b
Ohhh damn.. got it. let me check
Thanks a lot. Looks like the setup job for mysql completed
It is failing on the post-install
e
that is fine
it will succeed eventually
once gms is live
b
Copy code
client.go:557: [debug] Add/Modify event for datahub-datahub-upgrade-job: ADDED
client.go:596: [debug] datahub-datahub-upgrade-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:84: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
	<http://helm.sh/helm/v3/cmd/helm/install.go:127|helm.sh/helm/v3/cmd/helm/install.go:127>
<http://github.com/spf13/cobra.(*Command).execute|github.com/spf13/cobra.(*Command).execute>
	<http://github.com/spf13/cobra@v1.3.0/command.go:856|github.com/spf13/cobra@v1.3.0/command.go:856>
<http://github.com/spf13/cobra.(*Command).ExecuteC|github.com/spf13/cobra.(*Command).ExecuteC>
	<http://github.com/spf13/cobra@v1.3.0/command.go:974|github.com/spf13/cobra@v1.3.0/command.go:974>
<http://github.com/spf13/cobra.(*Command).Execute|github.com/spf13/cobra.(*Command).Execute>
	<http://github.com/spf13/cobra@v1.3.0/command.go:902|github.com/spf13/cobra@v1.3.0/command.go:902>
main.main
	<http://helm.sh/helm/v3/cmd/helm/helm.go:83|helm.sh/helm/v3/cmd/helm/helm.go:83>
runtime.main
	runtime/proc.go:255
runtime.goexit
	runtime/asm_amd64.s:1581
e
get pods?
b
Copy code
NAME                        READY  STATUS            RESTARTS  AGE
datahub-acryl-datahub-actions-c8868cdf6-tgtj5   1/1   Running           2     10m
datahub-datahub-frontend-8448f49655-pclfn     1/1   Running           0     10m
datahub-datahub-gms-87f49d87b-fj4l4        0/1   CreateContainerConfigError  0     10m
datahub-datahub-upgrade-job-qvqjj         0/1   CreateContainerConfigError  0     10m
datahub-elasticsearch-setup-job-ggnhd       0/1   Completed          0     11m
datahub-kafka-setup-job-g5vmx           0/1   Completed          0     11m
datahub-mysql-setup-job-s2plt           0/1   Completed          0     11m
elasticsearch-master-0               1/1   Running           0     150m
elasticsearch-master-1               1/1   Running           0     150m
elasticsearch-master-2               1/1   Running           0     150m
prerequisites-cp-schema-registry-cf79bfccf-6t7nq  2/2   Running           0     150m
prerequisites-kafka-0               1/1   Running           0     150m
prerequisites-neo4j-community-0          0/1   CreateContainerConfigError  0     150m
prerequisites-zookeeper-0             1/1   Running           0     150m
e
hmn
Copy code
CreateContainerConfigError
?
Can you run
Copy code
kubectl describe pod <<pod-name>> -n <<namespace>>
k
Sorry, I got pulled into something.. are we good here or do I add my team member who did all the helm deployments for us?
b
Copy code
kubectl logs datahub-datahub-upgrade-job-qvqjj
Error from server (BadRequest): container "datahub-upgrade-job" in pod "datahub-datahub-upgrade-job-qvqjj" is waiting to start: CreateContainerConfigError
Hi Atul, Thanks for checking.. The RDS issue got resolved. Looking into the post-install issue now
e
describe pod please
b
Copy code
Name:     datahub-datahub-upgrade-job-qvqjj
Namespace:  default
Priority:   0
Node:     ip-192-168-45-36.ec2.internal/192.168.45.36
Start Time:  Tue, 14 Jun 2022 11:02:14 -0700
Labels:    controller-uid=dac86aae-e63c-436d-9c87-5cbc7aadf9ea
       job-name=datahub-datahub-upgrade-job
Annotations: <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status:    Pending
IP:      192.168.55.11
IPs:
 IP:      192.168.55.11
Controlled By: Job/datahub-datahub-upgrade-job
Containers:
 datahub-upgrade-job:
  Container ID:  
  Image:     acryldata/datahub-upgrade:v0.8.31
  Image ID:    
  Port:     <none>
  Host Port:   <none>
  Args:
   -u
   NoCodeDataMigration
   -a
   batchSize=1000
   -a
   batchDelayMs=100
   -a
   dbType=MYSQL
  State:     Waiting
   Reason:    CreateContainerConfigError
  Ready:     False
  Restart Count: 0
  Limits:
   cpu:   500m
   memory: 512Mi
  Requests:
   cpu:   300m
   memory: 256Mi
  Environment:
   ENTITY_REGISTRY_CONFIG_PATH: /datahub/datahub-gms/resources/entity-registry.yml
   DATAHUB_GMS_HOST:       datahub-datahub-gms
   DATAHUB_GMS_PORT:       8080
   DATAHUB_MAE_CONSUMER_HOST:  datahub-datahub-mae-consumer
   DATAHUB_MAE_CONSUMER_PORT:  9091
   EBEAN_DATASOURCE_USERNAME:  datahub
   EBEAN_DATASOURCE_PASSWORD:  <set to the key 'mysql-root-password' in secret 'mysql-secrets'> Optional: false
   EBEAN_DATASOURCE_HOST:    <http://datahub.cluster-cevhjrrouwzn.us-east-1.rds.amazonaws.com:3306|datahub.cluster-cevhjrrouwzn.us-east-1.rds.amazonaws.com:3306>
   EBEAN_DATASOURCE_URL:     jdbc:<mysql://datahub.cluster-cevhjrrouwzn.us-east-1.rds.amazonaws.com:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8>
   EBEAN_DATASOURCE_DRIVER:   com.mysql.jdbc.Driver
   KAFKA_BOOTSTRAP_SERVER:    prerequisites-kafka:9092
   KAFKA_SCHEMAREGISTRY_URL:   <http://prerequisites-cp-schema-registry:8081>
   ELASTICSEARCH_HOST:      elasticsearch-master
   ELASTICSEARCH_PORT:      9200
   GRAPH_SERVICE_IMPL:      neo4j
   NEO4J_HOST:          prerequisites-neo4j-community:7474
   NEO4J_URI:          <bolt://prerequisites-neo4j-community>
   NEO4J_USERNAME:        neo4j
   NEO4J_PASSWORD:        <set to the key 'neo4j-password' in secret 'neo4j-secrets'> Optional: false
  Mounts:
   /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-54kpm (ro)
Conditions:
 Type       Status
 Initialized    True 
 Ready       False 
 ContainersReady  False 
 PodScheduled   True 
Volumes:
 kube-api-access-54kpm:
  Type:          Projected (a volume that contains injected data from multiple sources)
  TokenExpirationSeconds: 3607
  ConfigMapName:      kube-root-ca.crt
  ConfigMapOptional:    <nil>
  DownwardAPI:       true
QoS Class:          Burstable
Node-Selectors:       <none>
Tolerations:         <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
               <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
 Type   Reason   Age          From        Message
 ----   ------   ----         ----        -------
 Normal  Scheduled 17m          default-scheduler Successfully assigned default/datahub-datahub-upgrade-job-qvqjj to ip-192-168-45-36.ec2.internal
 Normal  Pulling  17m          kubelet      Pulling image "acryldata/datahub-upgrade:v0.8.31"
 Normal  Pulled   17m          kubelet      Successfully pulled image "acryldata/datahub-upgrade:v0.8.31" in 10.539330122s
 Warning Failed   15m (x12 over 17m)  kubelet      Error: secret "neo4j-secrets" not found
 Normal  Pulled   2m39s (x71 over 17m) kubelet      Container image "acryldata/datahub-upgrade:v0.8.31" already present on machine
e
Copy code
Error: secret "neo4j-secrets" not found
Are you trying to run with neo4j or with elasticsearch as graph backend?
b
I am not changing any default configuration for neo4j in values.yaml
Running
kubectl create secret generic neo4j-secrets --from-literal=neo4j-password=datahub
again and trying
b
@early-lamp-41924 He mentioned he was following the AWS deploy guide steps
b
The Config error is now resolved..
Only one of the upgrade job is failing now
e
As I mentioned above, that should succeed once gms is live
it will retry until it succeeds
b
So just wait for some time and again rerun the upgrade?
e
Kubernetes
will automatically do it
Wait it out
b
Ohh got it.. thanks a lot Dexter 🙏
b
Working now? Thanks everyone.
b
It started working now 😀 Thank you..
While using AWS SMK as dependency and trying to upgrade the datahub deployment it is failing with below error:
Copy code
[main] INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ...
[main] ERROR io.confluent.admin.utils.ClusterStatus - Error while getting broker list.
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1655262435166, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
	at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
	at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
	at io.confluent.admin.utils.ClusterStatus.isKafkaReady(ClusterStatus.java:149)
	at io.confluent.admin.utils.cli.KafkaReadyCommand.main(KafkaReadyCommand.java:150)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1655262435166, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited.
[main] INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ...
[main] ERROR io.confluent.admin.utils.ClusterStatus - Error while getting broker list.
Copy code
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Error while executing topic command : Call(callName=createTopics, deadlineMs=1655262505480, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
[2022-06-15 03:07:26,372] ERROR org.apache.kafka.common.errors.TimeoutException: Call(callName=createTopics, deadlineMs=1655262505480, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited. Call: createTopics
 (kafka.admin.TopicCommand$)
[2022-06-15 03:07:26,463] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread)
java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
	at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:113)
	at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452)
	at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402)
	at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)
	at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)
	at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1333)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1264)
	at java.lang.Thread.run(Thread.java:750)
Kafka nodes are having 1000 GB of storage each
@early-lamp-41924 Any suggestion on this ^
nvm, It worked...
b
Glad to hear it Ronak.
🙏 1
c
Hi @bitter-dog-24903 How did you solved it? I have allocated only 100 GB for Kafka nodes.
b
Hi Siva, for me the issue was resolved by setting PlainText as authentication type and using the plainText endpoint in the values.yaml file to connect to MSK (Kafka) cluster.
plus1 1
Hi @early-lamp-41924 @big-carpet-38439, I have deployed the https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml file on AWS ECS. When I try to login into the Datahub UI, I am getting error:
Failed to log in! SyntaxError: Unexpected token < in JSON at position 1
. Is there any other compose file that I should be deploying or any change to this file? The Gms container in AWS ECS is exiting with error:
Command exited with error: signal: killed
f
Hi guys, I face to the same issue 😕
Copy code
datahub-elastic-6cc45b5c4f-w2b6g                                 1/1     Running            0                2d8h
datahub-elasticsearch-setup-job--1-bb5n9                         0/1     Error              0                17m
datahub-elasticsearch-setup-job--1-fqlvq                         0/1     Error              0                21m
datahub-elasticsearch-setup-job--1-gxgdd                         0/1     Error              0                20m
datahub-elasticsearch-setup-job--1-jjl4w                         0/1     Error              0                21m
datahub-elasticsearch-setup-job--1-qd89w                         0/1     Error              0                12m
datahub-elasticsearch-setup-job--1-rtpgj                         0/1     Error              0                21m
datahub-elasticsearch-setup-job--1-wsksj                         0/1     Error              0                21m
datahub-kafka-789989768f-mhzw5                                   1/1     Running            0                2d8h
datahub-kafka-setup-job--1-6kqhf                                 0/1     Error              0                130m
datahub-kafka-setup-job--1-ggg6d                                 0/1     Error              0                134m
datahub-kafka-setup-job--1-q549v                                 0/1     Error              0                123m
datahub-kafka-setup-job--1-sfmgd                                 0/1     Error              0                128m
datahub-kafka-setup-job--1-svwsj                                 0/1     Error              0                133m
datahub-kafka-setup-job--1-t4vg6                                 0/1     Error              0                132m
datahub-kafka-setup-job--1-v5fzb                                 0/1     Error              0                117m
datahub-mysql-7cfd455897-dp4zb                                   1/1     Running            0                2d22h
datahub-zookeeper-569df875bd-7wgsj                               1/1     Running            0                2d8h
b
could you share more logs from the pod and also describe the pod for which you are facing issue
Copy code
kubectl logs <<pod-name>> -n <<namespace>>
kubectl describe <<pod-name>> -n <<namespace>>
f
Hi @bumpy-needle-3184, I’ve solved this issue by pointing out nodeSelector. Because our EKS cluster was mixed ARM and Non-ARM, the ARM node cannot be compatible with DataHub chart.
b
good to know
b
Hello @early-lamp-41924 I am deploying datahub and its components using docker on AWS ECS. All the containers seem to be stable except the datahub-gms and datahub-actions. The datahub-gms container logs has below error:
Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
at org.apache.http.util.Asserts.check(Asserts.java:46)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:255)
... 19 common frames omitted
21:11:26.190 [pool-6-thread-1] ERROR c.l.m.s.e.query.ESSearchDAO:72 - Search query failed
java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:857)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:259)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:60)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:100)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:67)
at com.linkedin.entity.client.JavaEntityClient.search(JavaEntityClient.java:288)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:50)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:42)
at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:222)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
at org.apache.http.util.Asserts.check(Asserts.java:46)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:255)
... 19 common frames omitted
21:11:26.190 [pool-6-thread-1] ERROR c.d.authorization.DataHubAuthorizer:229 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
com.datahub.util.exception.ESQueryException: Search query failed:
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:73)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:100)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:67)
at com.linkedin.entity.client.JavaEntityClient.search(JavaEntityClient.java:288)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:50)
at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:42)
at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:222)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
And datahub-actions container is failing with below error:
2022/10/03 17:51:47 Received 503 from <http://datahub-gms:8080/health>. Sleeping 1s
2022/10/03 17:51:48 Received 503 from <http://datahub-gms:8080/health>. Sleeping 1s
2022/10/03 17:51:49 Received 503 from <http://datahub-gms:8080/health>. Sleeping 1s
2022/10/03 17:51:49 Timeout after 4m0s waiting on dependencies to become available: [<http://datahub-gms:8080/health>]
Can you please help with any suggestion on resolving this?