Hi All, I need help with kubernetes installation. ...
# troubleshooting
f
Hi All, I need help with kubernetes installation. I helm install pinot using the chart included in main pinot github (by cloning the entire pinot source code). Apparently broker-0 could not able to come up because of
Cluster structure is not set up for cluster: pinot-quickstart
What went wrong here ?
Copy code
cd incubator-pinot/kubernetes/helm/pinot
helm install -n pinot-quickstart2 pinot .
Copy code
Session establishment complete on server pinot-zookeeper/172.20.47.166:2181, sessionid = 0x1012fddef2b0003, negotiated timeout = 30000
zookeeper state changed (SyncConnected)
MBean HelixZkClient:Key=pinot-quickstart.Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart2.svc.cluster.local_8099,Type=SPECTATOR has been registered.
MBean HelixZkClient:Key=pinot-quickstart.Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart2.svc.cluster.local_8099,PATH=Root,Type=SPECTATOR has been registered.
KeeperState: SyncConnected, instance: Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart2.svc.cluster.local_8099, type: SPECTATOR
Handle new session, instance: Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart2.svc.cluster.local_8099, type: SPECTATOR
Handling new session, session id: 1012fddef2b0003, instance: Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart2.svc.cluster.local_8099, instanceTye: SPECTATOR, cluster: pinot-quickstart
fail to createClient.
org.apache.helix.HelixException: Cluster structure is not set up for cluster: pinot-quickstart
        at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1124) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-47a75e5093129cc280de4c118434ccb337cd3da1]
I found the issue. I'm using AWS EKS, and the default storageClass is "gp2". The helm chart storageClass was set to "" and it didn't choose the default nor map to "gp2"
x
typically start order is: zookeeper/controller/(broker/servers/minion)
i thought the storageclass should be default?
which points to your system default one
e.g.
Copy code
➜ kt get storageclass
NAME                PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile           <http://kubernetes.io/azure-file|kubernetes.io/azure-file>   Delete          Immediate              true                   57d
azurefile-premium   <http://kubernetes.io/azure-file|kubernetes.io/azure-file>   Delete          Immediate              true                   57d
default (default)   <http://kubernetes.io/azure-disk|kubernetes.io/azure-disk>   Delete          WaitForFirstConsumer   true                   57d
managed-premium     <http://kubernetes.io/azure-disk|kubernetes.io/azure-disk>   Delete          WaitForFirstConsumer   true                   57d
f
Looks like i have 2 defaults:
Copy code
Warning  FailedCreate  5m20s (x21 over 46m)  statefulset-controller  create Claim data-pinot-controller-0 for Pod pinot-controller-0 in StatefulSet pinot-controller failed error: persistentvolumeclaims "data-pinot-controller-0" is forbidden: Internal error occurred: 2 default StorageClasses were found
  Warning  FailedCreate  5m20s (x21 over 46m)  statefulset-controller  create Pod pinot-controller-0 in StatefulSet pinot-controller failed error: failed to create PVC data-pinot-controller-0: persistentvolumeclaims "data-pinot-controller-0" is forbidden: Internal error occurred: 2 default StorageClasses were found
x
oic
then this will fail
you nee to configure it explicitly
f
Yes thanks. What would be a good storage for K8S deployment ?
Would you recommend EBS ?
x
typically for controller you can use s3, server you can use ebs
for testing you can use ebs for both
controller should have access to deep store, which can be mounted ebs or s3
f
Thanks. I'm planning to use S3 for deep storage as well
x
servers will talk to deep store through controller api(for mounted ebs cases) or s3 api
r
I have this same problem. I changed
values.yml
to give
storageClass
"gp2"
. It didn't work. Is this what you mean to configure it explicitly? Or do I need to change the
/templates
?
Copy code
values.yaml
76:    storageClass: "gp2"
244:    storageClass: "gp2"
245:    #storageClass: "ssd"
330:    storageClass: "gp2"
331:    #storageClass: "ssd"
also, gp2 is my default:
Copy code
% kubectl get storageclass
NAME            PROVISIONER             AGE
gp2 (default)   <http://kubernetes.io/aws-ebs|kubernetes.io/aws-ebs>   143m
x
where does it stop?
can you describe the pod and check
also check if PVCs are already created
r
4 pods are in error. pinot-broker-0:
org.apache.helix.HelixException: Cluster structure is not set up for cluster: pinot-quickstart
pinot-controller-0:
Socket error occurred: localhost/127.0.0.1:2181: Connection refused
pinot-minion-0:
org.apache.helix.HelixException: Cluster structure is not set up for cluster: pinot-quickstart
pinot-server-0:
org.apache.helix.HelixException: Cluster structure is not set up for cluster: pinot-quickstart
PVC was created- I can see it in my AWS console.
I guess the
pinot-quickstart-standard-workers-Node
EBS is gp3. I don't know if that matters
I even get these errors running this on minikube
x
I see, is zookeeper pod up?
The startup sequence should be zookeeper->controller ->broker/server
r
Zookeeper is up. The pods seem to start in that order.
f
Socket error occurred: localhost/127.0.0.1:2181: Connection refused
seemed like controller connection to ZK had issue.
r
@Fritz Budiyanto your error did not involve the controller refusing to connect?
f
No, connection to ZK was okay. You can double check by manually connecting to ZK with zkCli.sh, comes with ZK distribution
πŸ™Œ 1
ZK also has storageClass. You may want to check if ZK came up okay.
r
How did you explicitly define storageClass? Just in the
values.yaml
?
f
Copy code
zookeeper:
  ## If true, install the Zookeeper chart alongside Pinot
  ## ref: <https://github.com/kubernetes/charts/tree/master/incubator/zookeeper>
  enabled: true
  storageClass: "efs-sc-dyn"
I was using EFS for ZK
x
hmm, why it’s connecting to localhost:2181?
it should be
zookeeper:2181
?
r
I have no idea- I'm really just following the quickstart instructions. Would it help if i shared a video of my entire quickstart?
x
hmm, which quickstart?
the helmchart one ?
r
yeah
x
have you modified the zookeeper in values.yaml ?
πŸ‘Ž 1
r
Nope
x
i see
are you using helm install from the codebase?
r
Yes, Im using this command you gave me earlier:
Copy code
helm install pinot  --values values.yaml -n pinot-quickstart .
x
can you describe your pinot-controller statefulset and paste the output here?
and describe the content of pinot-controller inside the configmaps
f
@Ryan Clark I was using codebase release-0.7.1, since top of the tree branch introduced new setting for ZK
urlOverride: "my-zookeeper:2181/my-pinot"
Copy code
git branch
* (HEAD detached at release-0.7.1)
  master

diff --git a/kubernetes/helm/pinot/values.yaml b/kubernetes/helm/pinot/values.yaml
index 5b574aab0..dd170eafe 100644
--- a/kubernetes/helm/pinot/values.yaml
+++ b/kubernetes/helm/pinot/values.yaml
@@ -21,7 +21,7 @@

 image:
   repository: apachepinot/pinot
-  tag: latest
+  tag: release-0.7.1
   pullPolicy: IfNotPresent

 cluster:
r
Copy code
ryanclark@DESKTOP-3TVB8KH pinot % kubectl describe pod/pinot-controller-0 -n pinot-quickstart
Name:         pinot-controller-0
Namespace:    pinot-quickstart
Priority:     0
Node:         ip-192-168-19-85.ec2.internal/192.168.19.85
Start Time:   Tue, 06 Jul 2021 12:14:48 -0500
Labels:       app=pinot
              <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
              <http://app.kubernetes.io/version=0.2.3|app.kubernetes.io/version=0.2.3>
              chart=pinot-0.2.3
              component=controller
              controller-revision-hash=pinot-controller-654bbcbbf7
              <http://helm.sh/chart=pinot-0.2.3|helm.sh/chart=pinot-0.2.3>
              heritage=Helm
              release=pinot
              <http://statefulset.kubernetes.io/pod-name=pinot-controller-0|statefulset.kubernetes.io/pod-name=pinot-controller-0>
Annotations:  <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status:       Running
IP:           192.168.3.200
IPs:
  IP:           192.168.3.200
Controlled By:  StatefulSet/pinot-controller
Containers:
  controller:
    Container ID:  <docker://7a1bb1bbcf2f70cbfa1d1ad1314809e8edcd3c7265eeefb29f631d14c142a9a>5
    Image:         apachepinot/pinot:latest-jdk11
    Image ID:      <docker-pullable://apachepinot/pinot@sha256:d56caffcafd469a7c1f4767b73f4d0b4ecc7f3dbbf2bbebb43e1193b28862322>
    Port:          9000/TCP
    Host Port:     0/TCP
    Args:
      StartController
      -configFileName
      /var/pinot/controller/config/pinot-controller.conf
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 06 Jul 2021 17:54:32 -0500
      Finished:     Tue, 06 Jul 2021 17:55:06 -0500
    Ready:          False
    Restart Count:  64
    Environment:
      JAVA_OPTS:  -Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xlog:gc*:file=/opt/pinot/gc-pinot-controller.log -Dlog4j2.configurationFile=/opt/pinot/conf/log4j2.xml -Dplugins.dir=/opt/pinot/plugins
    Mounts:
      /var/pinot/controller/config from config (rw)
      /var/pinot/controller/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pinot-token-h5krj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-pinot-controller-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pinot-controller-config
    Optional:  false
  pinot-token-h5krj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pinot-token-h5krj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Warning  BackOff  53s (x1392 over 5h39m)  kubelet  Back-off restarting failed container
x
can you check the output of
Copy code
kubectl get configmaps pinot-controller-config -n pinot-quickstart -o yaml
r
Copy code
ryanclark@DESKTOP-3TVB8KH pinot % kubectl describe configmaps pinot-controller-config -n pinot-quickstart
Name:         pinot-controller-config
Namespace:    pinot-quickstart
Labels:       <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
Annotations:  <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: pinot
              <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: pinot-quickstart

Data
====
pinot-controller.conf:
----
controller.helix.cluster.name=pinot-quickstart
controller.port=9000
controller.data.dir=/var/pinot/controller/data
controller.zk.str=pinot-zookeeper:2181
pinot.set.instance.id.to.hostname=true
controller.task.scheduler.enabled=true
Events:  <none>
x
ok
it shows
Copy code
controller.zk.str=pinot-zookeeper:2181
πŸ™Œ 1
hmm
r
Several of my colleagues have the same problem on local K8s (minikube) and aws deployment
x
I think I found the root cause
will provide a fix soon
πŸ™Œ 2
This is fixed
can you try to pull with the
latest-jdk11
image?
b
this fixed the issue I was experiencing, ty @Xiang Fu πŸ™
r
@Xiang Fu It works. Thank you for you help!