I m trying to deploy DataHub to EC2 using minikube but it s DataHub #troubleshoot

$ kubectl get pods
NAME                                                READY   STATUS    RESTARTS        AGE
elasticsearch-master-0                              1/1     Running   0               5m3s
prerequisites-cp-schema-registry-6f4b5b894f-zd8kn   2/2     Running   0               5m3s
prerequisites-kafka-0                               1/1     Running   1 (4m21s ago)   5m3s
prerequisites-mysql-0                               1/1     Running   0               5m3s
prerequisites-neo4j-community-0                     1/1     Running   0               5m3s
prerequisites-zookeeper-0                           1/1     Running   0               5m3s

$ helm install datahub datahub/datahub
W0318 06:58:33.319740  292308 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W0318 06:58:33.321488  292308 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition

❤️ 1

breezy-controller-54597

03/18/2022, 7:30 AM

prerequisites log:

Copy code

$ helm install prerequisites datahub/datahub-prerequisites --values ./values.yaml 
W0318 06:56:24.157886  284983 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0318 06:56:24.221273  284983 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
NAME: prerequisites
LAST DEPLOYED: Fri Mar 18 06:56:23 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1

values.yml:

Copy code

# Default configuration for pre-requisites to get you started
# Copy this file and update to the configuration of choice
elasticsearch:
  enabled: true   # set this to false, if you want to provide your own ES instance.
  replicas: 1 # <<CHANGED 3 -> 1>>
  minimumMasterNodes: 1
  # Set replicas to 1 and uncomment this to allow the instance to be scheduled on
  # a master node when deploying on a single node Minikube / Kind / etc cluster.
  antiAffinity: "soft" # <<UNCOMMENT>>
...

helpful-optician-78938

03/18/2022, 5:39 PM

@early-lamp-41924, could you take a look at this?

early-lamp-41924

03/18/2022, 5:42 PM

Can you post the result of

Copy code

kubectl get pods -n <<namespace>>

breezy-controller-54597

03/22/2022, 12:43 AM

@early-lamp-41924 Sorry, yesterday was a holiday.

Copy code

$ kubectl get pods -n default
NAME                                                READY   STATUS    RESTARTS        AGE
datahub-elasticsearch-setup-job-skflw               0/1     Pending   0               25m
elasticsearch-master-0                              1/1     Running   1 (3d16h ago)   3d17h
prerequisites-cp-schema-registry-6f4b5b894f-zd8kn   2/2     Running   2 (3d16h ago)   3d17h
prerequisites-kafka-0                               1/1     Running   3 (27m ago)     3d17h
prerequisites-mysql-0                               1/1     Running   1 (3d16h ago)   3d17h
prerequisites-neo4j-community-0                     1/1     Running   1 (3d16h ago)   3d17h
prerequisites-zookeeper-0                           1/1     Running   1 (3d16h ago)   3d17h

early-lamp-41924

03/22/2022, 2:39 AM

seems like elasticsearch setup job is pending. do you have enough resources in your kubernetes cluster?

early-lamp-41924

03/22/2022, 2:39 AM

ah this is minikube. Seems like it ran out of resources. You could try the following command to see why it’s pending

Copy code

kubectl describe pod datahub-elasticsearch-setup-job-skflw

breezy-controller-54597

03/22/2022, 2:47 AM

Copy code

$ kubectl describe pod datahub-elasticsearch-setup-job-skflw
Name:           datahub-elasticsearch-setup-job-skflw
Namespace:      default
Priority:       0
Node:           <none>
Labels:         controller-uid=e2f13004-5071-4754-af2a-8b096221dc35
                job-name=datahub-elasticsearch-setup-job
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  Job/datahub-elasticsearch-setup-job
Containers:
  elasticsearch-setup-job:
    Image:      linkedin/datahub-elasticsearch-setup:v0.8.31
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:     500m
      memory:  512Mi
    Requests:
      cpu:     300m
      memory:  256Mi
    Environment:
      ELASTICSEARCH_HOST:         elasticsearch-master
      ELASTICSEARCH_PORT:         9200
      DATAHUB_ANALYTICS_ENABLED:  true
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gg8rr (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-gg8rr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  4m5s (x144 over 149m)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu.

breezy-controller-54597

03/22/2022, 2:48 AM

It seems to be running out of CPU.

breezy-controller-54597

03/22/2022, 2:52 AM

Thank you very much. I will change the instance type and try again.

breezy-controller-54597

03/22/2022, 3:57 AM

The previous error was resolved and another error occurred.

Copy code

$ helm install datahub datahub/datahub
W0322 03:05:42.173062   29708 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W0322 03:05:42.174505   29708 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W0322 03:07:02.684983   29708 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W0322 03:07:02.686529   29708 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition

$ kubectl get all
NAME                                                    READY   STATUS             RESTARTS      AGE
pod/datahub-acryl-datahub-actions-6bb4b7c68d-8z72g      1/1     Running            0             48m
pod/datahub-datahub-frontend-5547f7768c-7sgpg           1/1     Running            0             48m
pod/datahub-datahub-gms-56c67fbb74-9sdp6                1/1     Running            0             48m
pod/datahub-datahub-upgrade-job-wm4km                   0/1     ImagePullBackOff   0             48m
pod/datahub-elasticsearch-setup-job-ft2ns               0/1     Completed          0             49m
pod/datahub-kafka-setup-job-mxbhh                       0/1     Completed          0             49m
pod/datahub-mysql-setup-job-5x5n6                       0/1     Completed          0             48m
pod/elasticsearch-master-0                              1/1     Running            2 (53m ago)   54m
pod/prerequisites-cp-schema-registry-6f4b5b894f-n2qlv   2/2     Running            0             54m
pod/prerequisites-kafka-0                               1/1     Running            1 (53m ago)   54m
pod/prerequisites-mysql-0                               1/1     Running            0             54m
pod/prerequisites-neo4j-community-0                     1/1     Running            0             54m
pod/prerequisites-zookeeper-0                           1/1     Running            0             54m

NAME                                       TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/datahub-acryl-datahub-actions      ClusterIP      10.111.190.10    <none>        9093/TCP                     48m
service/datahub-datahub-frontend           LoadBalancer   10.101.154.91    <pending>     9002:31696/TCP               48m
service/datahub-datahub-gms                LoadBalancer   10.101.213.209   <pending>     8080:31870/TCP               48m
service/elasticsearch-master               ClusterIP      10.99.234.177    <none>        9200/TCP,9300/TCP            54m
service/elasticsearch-master-headless      ClusterIP      None             <none>        9200/TCP,9300/TCP            54m
service/kubernetes                         ClusterIP      10.96.0.1        <none>        443/TCP                      3d21h
service/prerequisites-cp-schema-registry   ClusterIP      10.98.221.82     <none>        8081/TCP,5556/TCP            54m
service/prerequisites-kafka                ClusterIP      10.98.148.159    <none>        9092/TCP                     54m
service/prerequisites-kafka-headless       ClusterIP      None             <none>        9092/TCP,9093/TCP            54m
service/prerequisites-mysql                ClusterIP      10.105.186.66    <none>        3306/TCP                     54m
service/prerequisites-mysql-headless       ClusterIP      None             <none>        3306/TCP                     54m
service/prerequisites-neo4j-community      ClusterIP      None             <none>        7474/TCP,7687/TCP            54m
service/prerequisites-zookeeper            ClusterIP      10.107.1.173     <none>        2181/TCP,2888/TCP,3888/TCP   54m
service/prerequisites-zookeeper-headless   ClusterIP      None             <none>        2181/TCP,2888/TCP,3888/TCP   54m

NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/datahub-acryl-datahub-actions      1/1     1            1           48m
deployment.apps/datahub-datahub-frontend           1/1     1            1           48m
deployment.apps/datahub-datahub-gms                1/1     1            1           48m
deployment.apps/prerequisites-cp-schema-registry   1/1     1            1           54m

NAME                                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/datahub-acryl-datahub-actions-6bb4b7c68d      1         1         1       48m
replicaset.apps/datahub-datahub-frontend-5547f7768c           1         1         1       48m
replicaset.apps/datahub-datahub-gms-56c67fbb74                1         1         1       48m
replicaset.apps/prerequisites-cp-schema-registry-6f4b5b894f   1         1         1       54m

NAME                                             READY   AGE
statefulset.apps/elasticsearch-master            1/1     54m
statefulset.apps/prerequisites-kafka             1/1     54m
statefulset.apps/prerequisites-mysql             1/1     54m
statefulset.apps/prerequisites-neo4j-community   1/1     54m
statefulset.apps/prerequisites-zookeeper         1/1     54m

NAME                                                         SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/datahub-datahub-cleanup-job-template           * * * * *   True      0        <none>          48m
cronjob.batch/datahub-datahub-restore-indices-job-template   * * * * *   True      0        <none>          48m

NAME                                        COMPLETIONS   DURATION   AGE
job.batch/datahub-datahub-upgrade-job       0/1           48m        48m
job.batch/datahub-elasticsearch-setup-job   1/1           1s         49m
job.batch/datahub-kafka-setup-job           1/1           72s        49m
job.batch/datahub-mysql-setup-job           1/1           7s         48m

breezy-controller-54597

03/22/2022, 3:58 AM

Copy code

$ kubectl describe pod datahub-datahub-upgrade-job-wm4km
Name:         datahub-datahub-upgrade-job-wm4km
Namespace:    default
Priority:     0
Node:         ip-10-0-0-46.ap-northeast-1.compute.internal/10.0.0.46
Start Time:   Tue, 22 Mar 2022 03:07:02 +0000
Labels:       controller-uid=c2a6d185-25bd-4f48-be64-df6b89b110d4
              job-name=datahub-datahub-upgrade-job
Annotations:  <none>
Status:       Pending
IP:           172.17.0.12
IPs:
  IP:           172.17.0.12
Controlled By:  Job/datahub-datahub-upgrade-job
Containers:
  datahub-upgrade-job:
    Container ID:  
    Image:         acryldata/datahub-upgrade:v0.8.31
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      -u
      NoCodeDataMigration
      -a
      batchSize=1000
      -a
      batchDelayMs=100
      -a
      dbType=MYSQL
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  512Mi
    Requests:
      cpu:     300m
      memory:  256Mi
    Environment:
      ENTITY_REGISTRY_CONFIG_PATH:  /datahub/datahub-gms/resources/entity-registry.yml
      DATAHUB_GMS_HOST:             datahub-datahub-gms
      DATAHUB_GMS_PORT:             8080
      DATAHUB_MAE_CONSUMER_HOST:    datahub-datahub-mae-consumer
      DATAHUB_MAE_CONSUMER_PORT:    9091
      EBEAN_DATASOURCE_USERNAME:    root
      EBEAN_DATASOURCE_PASSWORD:    <set to the key 'mysql-root-password' in secret 'mysql-secrets'>  Optional: false
      EBEAN_DATASOURCE_HOST:        prerequisites-mysql:3306
      EBEAN_DATASOURCE_URL:         jdbc:<mysql://prerequisites-mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2>
      EBEAN_DATASOURCE_DRIVER:      com.mysql.cj.jdbc.Driver
      KAFKA_BOOTSTRAP_SERVER:       prerequisites-kafka:9092
      KAFKA_SCHEMAREGISTRY_URL:     <http://prerequisites-cp-schema-registry:8081>
      ELASTICSEARCH_HOST:           elasticsearch-master
      ELASTICSEARCH_PORT:           9200
      GRAPH_SERVICE_IMPL:           neo4j
      NEO4J_HOST:                   prerequisites-neo4j-community:7474
      NEO4J_URI:                    <bolt://prerequisites-neo4j-community>
      NEO4J_USERNAME:               neo4j
      NEO4J_PASSWORD:               <set to the key 'neo4j-password' in secret 'neo4j-secrets'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c88rr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-c88rr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  50m                  default-scheduler  Successfully assigned default/datahub-datahub-upgrade-job-wm4km to ip-10-0-0-46.ap-northeast-1.compute.internal
  Warning  Failed     50m                  kubelet            Failed to pull image "acryldata/datahub-upgrade:v0.8.31": rpc error: code = Unknown desc = write /var/lib/docker/tmp/GetImageBlob161382082: no space left on device
  Warning  Failed     49m                  kubelet            Failed to pull image "acryldata/datahub-upgrade:v0.8.31": rpc error: code = Unknown desc = write /var/lib/docker/tmp/GetImageBlob867326921: no space left on device
  Warning  Failed     49m                  kubelet            Failed to pull image "acryldata/datahub-upgrade:v0.8.31": rpc error: code = Unknown desc = write /var/lib/docker/tmp/GetImageBlob693904981: no space left on device
  Normal   Pulling    48m (x4 over 50m)    kubelet            Pulling image "acryldata/datahub-upgrade:v0.8.31"
  Warning  Failed     48m (x4 over 50m)    kubelet            Error: ErrImagePull
  Warning  Failed     48m                  kubelet            Failed to pull image "acryldata/datahub-upgrade:v0.8.31": rpc error: code = Unknown desc = write /var/lib/docker/tmp/GetImageBlob242649088: no space left on device
  Warning  Failed     47m (x6 over 50m)    kubelet            Error: ImagePullBackOff
  Normal   BackOff    49s (x211 over 50m)  kubelet            Back-off pulling image "acryldata/datahub-upgrade:v0.8.31"

breezy-controller-54597

03/22/2022, 4:00 AM

I will check the storage.

breezy-controller-54597

03/22/2022, 4:24 AM

It seems to be working fine. I think that it is necessary to configure Ingress to access it from public, could you give me the exact steps?

Once you confirm that the pods are running well, you can set up ingress for datahub-frontend to expose the 9002 port to the public.

early-lamp-41924

03/22/2022, 4:35 AM

Ingress is specific to the platform you are running. We have a guide for running on EKS here https://datahubproject.io/docs/deploy/aws#expose-endpoints-using-a-load-balancer For minikube, I would suggest looking into their logs to figure out the exact set up needed! https://minikube.sigs.k8s.io/docs/handbook/addons/ingress-dns/

breezy-controller-54597

03/22/2022, 4:44 AM

Thank you very much. Okay, so I just need to route the external access to "service/datahub-datahub-frontend" in Ingress, right?

early-lamp-41924

03/22/2022, 4:58 AM

Yes!!

👍 1

breezy-controller-54597

03/22/2022, 7:53 AM

I was able to access it from the outside using Ingress. 🙌 The login screen appeared, but datahub-gms seems to be crashing.

breezy-controller-54597

03/22/2022, 7:56 AM

Copy code

$ kubectl describe pod datahub-datahub-gms-56c67fbb74-9sdp6
...
Events:
  Type     Reason          Age                    From     Message
  ----     ------          ----                   ----     -------
  Normal   SandboxChanged  58m                    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          54m (x2 over 58m)      kubelet  Container image "linkedin/datahub-gms:v0.8.31" already present on machine
  Normal   Created         54m (x2 over 58m)      kubelet  Created container datahub-gms
  Normal   Started         54m (x2 over 58m)      kubelet  Started container datahub-gms
  Warning  Unhealthy       53m (x7 over 57m)      kubelet  Liveness probe failed: Get "<http://172.17.0.2:8080/health>": dial tcp 172.17.0.2:8080: connect: connection refused
  Warning  Unhealthy       8m33s (x66 over 57m)   kubelet  Readiness probe failed: Get "<http://172.17.0.2:8080/health>": dial tcp 172.17.0.2:8080: connect: connection refused  Warning  BackOff         3m19s (x117 over 54m)  kubelet  Back-off restarting failed container

breezy-controller-54597

03/22/2022, 7:58 AM

The status of datahub-gms repeats Running → Error → CrashLoopBackOff.

early-lamp-41924

03/22/2022, 5:43 PM

Hey can you check the logs of the pod?

Copy code

kubectl logs <<pod-name>>

breezy-controller-54597

03/23/2022, 2:11 AM

I did helm uninstall datahub and prerequisites once, elasticsearch-master is no longer READY.

Copy code

$ kubectl get pod -n default
NAME                                                READY   STATUS    RESTARTS       AGE
elasticsearch-master-0                              0/1     Running   0              2m24s
prerequisites-cp-schema-registry-6f4b5b894f-fmkfr   2/2     Running   0              2m24s
prerequisites-kafka-0                               1/1     Running   1 (2m3s ago)   2m24s
prerequisites-mysql-0                               1/1     Running   0              2m24s
prerequisites-neo4j-community-0                     1/1     Running   0              2m24s
prerequisites-zookeeper-0                           1/1     Running   0              2m24s

breezy-controller-54597

03/23/2022, 2:11 AM

Copy code

$ kubectl describe pod elasticsearch-master-0
...
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  4m32s                 default-scheduler  Successfully assigned default/elasticsearch-master-0 to ip-10-0-0-46.ap-northeast-1.compute.internal
  Normal   Pulled     4m29s                 kubelet            Container image "<http://docker.elastic.co/elasticsearch/elasticsearch:7.16.2|docker.elastic.co/elasticsearch/elasticsearch:7.16.2>" already present on machine
  Normal   Created    4m29s                 kubelet            Created container configure-sysctl
  Normal   Started    4m29s                 kubelet            Started container configure-sysctl
  Normal   Pulled     4m29s                 kubelet            Container image "<http://docker.elastic.co/elasticsearch/elasticsearch:7.16.2|docker.elastic.co/elasticsearch/elasticsearch:7.16.2>" already present on machine
  Normal   Created    4m29s                 kubelet            Created container elasticsearch
  Normal   Started    4m29s                 kubelet            Started container elasticsearch
  Warning  Unhealthy  91s (x19 over 4m11s)  kubelet            Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )

breezy-controller-54597

03/23/2022, 2:18 AM

kubectl logs

elasticdearch-master.log

early-lamp-41924

03/23/2022, 3:05 AM

Hmn not sure why it’s not ready. elasticsearch here is just a vanilla deployment of the official elasticsearch helm chart. I would try restarting pod to see if recovers. If not run

Copy code

kubectl get pvc

early-lamp-41924

03/23/2022, 3:05 AM

and then delete the pvc corresponding to elasticsearch so that

early-lamp-41924

03/23/2022, 3:05 AM

it starts from scratch

breezy-controller-54597

03/23/2022, 4:25 AM

I removed all pvc and helm install again, but the result did not change.

breezy-controller-54597

03/23/2022, 4:33 AM

I'll look into it some more. Thank you.

breezy-controller-54597

03/23/2022, 6:57 AM

I need to add this in values.yaml

Copy code

clusterHealthCheckParams: 'wait_for_status=yellow&timeout=1s'

https://github.com/elastic/helm-charts/issues/783#issuecomment-701037663

breezy-controller-54597

03/23/2022, 7:11 AM

I could see "Welcome to DataHub"! Thanks for your help! 🙇‍♂️

breezy-controller-54597

03/23/2022, 7:12 AM

I made a Pull Request to datahub-helm for the bugs I found. https://github.com/acryldata/datahub-helm/pull/99

breezy-controller-54597

03/23/2022, 7:24 AM

This issue is similar to my problem. https://github.com/acryldata/datahub-helm/issues/25

2 Views

Open in Slack

Previous Next