https://pinot.apache.org/ logo
#getting-started
Title
# getting-started
w

Wojciech Wasik

09/28/2022, 5:08 PM
Hey, I’m working on POC for Pinot for our team, but I have a problem deploying it to AWS. I set up the k8n cluster on EKS by following the instructions https://docs.pinot.apache.org/basics/getting-started/public-cloud-examples/aws-quickstart, then I moved to start Pinot with Helm(https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart), but the broker keeps crashing, everything else is in the pending state, logs for broker show
NullPointerException
, more context on screenshots. I cannot run locally as well I have M1 macbook. I tried to compile it from the source, but it failed with
InvocationTargetException
Any help appreciated
k

Kishore G

09/28/2022, 5:36 PM
please check the resources in values in helm chart
w

Wojciech Wasik

09/28/2022, 6:43 PM
m

Mayank

09/28/2022, 6:48 PM
👀
x

Xiaobing

09/28/2022, 7:54 PM
it’s interesting that broker tried to get started before ZK was fully up (as in the first screenshot) and broker error seems like related to connecting with ZK
m

Mayank

09/28/2022, 7:54 PM
Right
x

Xiaobing

09/28/2022, 8:19 PM
I’ve noticed two recent changes on those helm chart files: 1) one bumped ZK from 7. to 9.; 2) one added explicit resource requirements so I’d wonder 1) how much resource provided for your pods? e.g. mem is required to be 1.25GB in the helm values.yaml file; 2) any error/logs from that Zookeeper pod?
x

Xiang Fu

09/28/2022, 8:30 PM
first time requires controller to go up
you can give more resources to your k8s or just update helm to start with less resources
w

Wojciech Wasik

09/28/2022, 8:56 PM
@Xiaobing 1) not sure(I have just a basic understanding about k8n and helm), but k8n cluster is on the t3.xlarge 2) no logs
x

Xiaobing

09/28/2022, 9:16 PM
not sure if EKS has cmd like
kubectl describe pod
to describe what’s happening for the pods. also perhaps give it a try to remove those new resource requirements added in values.yaml by this PR: https://github.com/apache/pinot/pull/9012/files and see how it goes.
first time requires controller to go up
cc @Xiang Fu is there a way to enforce the starting order?
w

Wojciech Wasik

09/28/2022, 9:25 PM
I tried 1) set less resources in values.yml 2) remove resource requirements, it failed for both but the second change gives a different exception (although still about not able to connect to ZK )
x

Xiaobing

09/28/2022, 9:34 PM
got it! at this point, I’d focus on to getting ZK, then Controller up firstly.
x

Xiang Fu

09/28/2022, 10:27 PM
you need resources to start all the pods
can you run
kubectl describe nodes
@Wojciech Wasik
you can also change wherever
Copy code
resources:
    requests:
      memory: "1.25Gi"
to
Copy code
resources: {}
in values.yaml
Copy code
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --values values.yaml
In case you can run below to get the
values.yaml
file
Copy code
helm inspect values pinot/pinot > values.yaml
w

Wojciech Wasik

09/29/2022, 7:17 AM
Thank you @Xiang Fu that’s all very useful
I checked the
kubectl describe nodes
with default
values.yaml
file
Copy code
Name:               ip-192-168-15-222.ec2.internal
Roles:              <none>
Labels:             <http://alpha.eksctl.io/cluster-name=pinot-quickstart|alpha.eksctl.io/cluster-name=pinot-quickstart>
                    <http://alpha.eksctl.io/nodegroup-name=standard-workers|alpha.eksctl.io/nodegroup-name=standard-workers>
                    <http://beta.kubernetes.io/arch=amd64|beta.kubernetes.io/arch=amd64>
                    <http://beta.kubernetes.io/instance-type=t3.xlarge|beta.kubernetes.io/instance-type=t3.xlarge>
                    <http://beta.kubernetes.io/os=linux|beta.kubernetes.io/os=linux>
                    <http://eks.amazonaws.com/capacityType=ON_DEMAND|eks.amazonaws.com/capacityType=ON_DEMAND>
                    <http://eks.amazonaws.com/nodegroup=standard-workers|eks.amazonaws.com/nodegroup=standard-workers>
                    <http://eks.amazonaws.com/nodegroup-image=ami-0eb9bd067e5d1e192|eks.amazonaws.com/nodegroup-image=ami-0eb9bd067e5d1e192>
                    <http://eks.amazonaws.com/sourceLaunchTemplateId=lt-0f8cc0670a7ea9852|eks.amazonaws.com/sourceLaunchTemplateId=lt-0f8cc0670a7ea9852>
                    <http://eks.amazonaws.com/sourceLaunchTemplateVersion=1|eks.amazonaws.com/sourceLaunchTemplateVersion=1>
                    <http://failure-domain.beta.kubernetes.io/region=us-east-1|failure-domain.beta.kubernetes.io/region=us-east-1>
                    <http://failure-domain.beta.kubernetes.io/zone=us-east-1c|failure-domain.beta.kubernetes.io/zone=us-east-1c>
                    <http://k8s.io/cloud-provider-aws=99efaf7c2c9d38f77b77c4a75e433a01|k8s.io/cloud-provider-aws=99efaf7c2c9d38f77b77c4a75e433a01>
                    <http://kubernetes.io/arch=amd64|kubernetes.io/arch=amd64>
                    <http://kubernetes.io/hostname=ip-192-168-15-222.ec2.internal|kubernetes.io/hostname=ip-192-168-15-222.ec2.internal>
                    <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
                    <http://node.kubernetes.io/instance-type=t3.xlarge|node.kubernetes.io/instance-type=t3.xlarge>
                    <http://topology.kubernetes.io/region=us-east-1|topology.kubernetes.io/region=us-east-1>
                    <http://topology.kubernetes.io/zone=us-east-1c|topology.kubernetes.io/zone=us-east-1c>
Annotations:        <http://alpha.kubernetes.io/provided-node-ip|alpha.kubernetes.io/provided-node-ip>: 192.168.15.222
                    <http://node.alpha.kubernetes.io/ttl|node.alpha.kubernetes.io/ttl>: 0
                    <http://volumes.kubernetes.io/controller-managed-attach-detach|volumes.kubernetes.io/controller-managed-attach-detach>: true
CreationTimestamp:  Wed, 28 Sep 2022 11:54:35 +0200
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-192-168-15-222.ec2.internal
  AcquireTime:     <unset>
  RenewTime:       Thu, 29 Sep 2022 09:16:30 +0200
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 29 Sep 2022 09:12:15 +0200   Wed, 28 Sep 2022 11:54:33 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 29 Sep 2022 09:12:15 +0200   Wed, 28 Sep 2022 11:54:33 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 29 Sep 2022 09:12:15 +0200   Wed, 28 Sep 2022 11:54:33 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Thu, 29 Sep 2022 09:12:15 +0200   Wed, 28 Sep 2022 11:56:48 +0200   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   192.168.15.222
  ExternalIP:   54.224.213.7
  Hostname:     ip-192-168-15-222.ec2.internal
  InternalDNS:  ip-192-168-15-222.ec2.internal
  ExternalDNS:  <http://ec2-54-224-213-7.compute-1.amazonaws.com|ec2-54-224-213-7.compute-1.amazonaws.com>
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         4
  ephemeral-storage:           83873772Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      16203828Ki
  pods:                        58
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         3920m
  ephemeral-storage:           76224326324
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      15186996Ki
  pods:                        58
System Info:
  Machine ID:                 ec2485bc260c53cc76d5e3608c3e4e86
  System UUID:                ec2485bc-260c-53cc-76d5-e3608c3e4e86
  Boot ID:                    91abe21a-fa0a-4701-b593-d129d2afc631
  Kernel Version:             5.4.209-116.367.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  <docker://20.10.17>
  Kubelet Version:            v1.23.9-eks-ba74326
  Kube-Proxy Version:         v1.23.9-eks-ba74326
ProviderID:                   aws:///us-east-1c/i-08c3cebafc6f15b28
Non-terminated Pods:          (5 in total)
  Namespace                   Name                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                      ------------  ----------  ---------------  -------------  ---
  kube-system                 aws-node-qm7m7            25m (0%)      0 (0%)      0 (0%)           0 (0%)         21h
  kube-system                 coredns-d5b9bfc4-qvb7v    100m (2%)     0 (0%)      70Mi (0%)        170Mi (1%)     21h
  kube-system                 coredns-d5b9bfc4-t5kqp    100m (2%)     0 (0%)      70Mi (0%)        170Mi (1%)     21h
  kube-system                 kube-proxy-lgv9w          100m (2%)     0 (0%)      0 (0%)           0 (0%)         21h
  pinot-quickstart            pinot-broker-0            0 (0%)        0 (0%)      0 (0%)           0 (0%)         99s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests    Limits
  --------                    --------    ------
  cpu                         325m (8%)   0 (0%)
  memory                      140Mi (0%)  340Mi (2%)
  ephemeral-storage           0 (0%)      0 (0%)
  hugepages-1Gi               0 (0%)      0 (0%)
  hugepages-2Mi               0 (0%)      0 (0%)
  attachable-volumes-aws-ebs  0           0
Events:                       <none>
It does not seem to be a resources/memory problem, but I’ll recheck the changing resources to
resources: {}
x

Xiang Fu

09/29/2022, 7:21 AM
yeah, your node is 4cpu 16g ram, so just change resources to {}. Another thing I want to check is what’s your storageclass type. can you run
Copy code
kubectl get storageclass
double check if you can use ebs as default storage class
the storage class name should be gp2
w

Wojciech Wasik

09/29/2022, 7:23 AM
right now it might be not
gp2
because i reverted my initial changes but I think I tried that too, let me double check
kubectl get storageclass
gives
gp2
but shiould I change the
Copy code
persistence:
    storageClass: ""
to gp2?
in values.yaml
x

Xiang Fu

09/29/2022, 8:01 AM
yes
So zk is starting right?
hmmm
can you do
kubectl describe pod/pinot-zookeeper-0
w

Wojciech Wasik

09/29/2022, 8:03 AM
Copy code
`Name:           pinot-zookeeper-0
Namespace:      pinot-quickstart
Priority:       0
Node:           <none>
Labels:         <http://app.kubernetes.io/component=zookeeper|app.kubernetes.io/component=zookeeper>
                <http://app.kubernetes.io/instance=pinot|app.kubernetes.io/instance=pinot>
                <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                <http://app.kubernetes.io/name=zookeeper|app.kubernetes.io/name=zookeeper>
                controller-revision-hash=pinot-zookeeper-546f7fdcdb
                <http://helm.sh/chart=zookeeper-9.2.7|helm.sh/chart=zookeeper-9.2.7>
                <http://statefulset.kubernetes.io/pod-name=pinot-zookeeper-0|statefulset.kubernetes.io/pod-name=pinot-zookeeper-0>
Annotations:    <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/pinot-zookeeper
Containers:
  zookeeper:
    Image:       <http://docker.io/bitnami/zookeeper:3.8.0-debian-11-r5|docker.io/bitnami/zookeeper:3.8.0-debian-11-r5>
    Ports:       2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /scripts/setup.sh
    Requests:
      cpu:      250m
      memory:   256Mi
    Liveness:   exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:               false
      ZOO_DATA_LOG_DIR:
      ZOO_PORT_NUMBER:             2181
      ZOO_TICK_TIME:               2000
      ZOO_INIT_LIMIT:              10
      ZOO_SYNC_LIMIT:              5
      ZOO_PRE_ALLOC_SIZE:          65536
      ZOO_SNAPCOUNT:               100000
      ZOO_MAX_CLIENT_CNXNS:        60
      ZOO_4LW_COMMANDS_WHITELIST:  srvr, mntr, ruok
      ZOO_LISTEN_ALLIPS_ENABLED:   no
      ZOO_AUTOPURGE_INTERVAL:      1
      ZOO_AUTOPURGE_RETAIN_COUNT:  5
      ZOO_MAX_SESSION_TIMEOUT:     40000
      ZOO_SERVERS:                 pinot-zookeeper-0.pinot-zookeeper-headless.pinot-quickstart.svc.cluster.local:2888:3888::1
      ZOO_ENABLE_AUTH:             no
      ZOO_HEAP_SIZE:               1024
      ZOO_LOG_LEVEL:               ERROR
      ALLOW_ANONYMOUS_LOGIN:       yes
      POD_NAME:                    pinot-zookeeper-0 (v1:metadata.name)
    Mounts:
      /bitnami/zookeeper from data (rw)
      /scripts/setup.sh from scripts (rw,path="setup.sh")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c9mj5 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-pinot-zookeeper-0
    ReadOnly:   false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pinot-zookeeper-scripts
    Optional:  false
  kube-api-access-c9mj5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  3m48s (x3 over 23m)  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
x

Xiang Fu

09/29/2022, 8:06 AM
hmm, did you set ?
Copy code
storageClass: "gp2"
can you keep this file to
gp2.yaml
Copy code
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
kind: StorageClass
metadata:
  name: gp2
  annotations:
    <http://storageclass.kubernetes.io/is-default-class|storageclass.kubernetes.io/is-default-class>: 'true'
parameters:
  fsType: ext4
  type: gp2
provisioner: <http://kubernetes.io/aws-ebs|kubernetes.io/aws-ebs>
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
Then run
kubectl apply -f gp2.yaml
it should create you a ebs disk
something interesting is that your k8s has no default storage class
w

Wojciech Wasik

09/29/2022, 8:18 AM
There’s no more warning about volume
Copy code
❯ kubectl describe pod/pinot-zookeeper-0
Name:           pinot-zookeeper-0
Namespace:      pinot-quickstart
Priority:       0
Node:           <none>
Labels:         <http://app.kubernetes.io/component=zookeeper|app.kubernetes.io/component=zookeeper>
                <http://app.kubernetes.io/instance=pinot|app.kubernetes.io/instance=pinot>
                <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                <http://app.kubernetes.io/name=zookeeper|app.kubernetes.io/name=zookeeper>
                controller-revision-hash=pinot-zookeeper-64bf956ff8
                <http://helm.sh/chart=zookeeper-7.0.0|helm.sh/chart=zookeeper-7.0.0>
                <http://statefulset.kubernetes.io/pod-name=pinot-zookeeper-0|statefulset.kubernetes.io/pod-name=pinot-zookeeper-0>
Annotations:    <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/pinot-zookeeper
Containers:
  zookeeper:
    Image:       <http://docker.io/bitnami/zookeeper:3.7.0-debian-10-r56|docker.io/bitnami/zookeeper:3.7.0-debian-10-r56>
    Ports:       2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -ec
      # Execute entrypoint as usual after obtaining ZOO_SERVER_ID
      # check ZOO_SERVER_ID in persistent volume via myid
      # if not present, set based on POD hostname
      if [[ -f "/bitnami/zookeeper/data/myid" ]]; then
        export ZOO_SERVER_ID="$(cat /bitnami/zookeeper/data/myid)"
      else
        HOSTNAME=`hostname -s`
        if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
          ORD=${BASH_REMATCH[2]}
          export ZOO_SERVER_ID=$((ORD + 1 ))
        else
          echo "Failed to get index from hostname $HOST"
          exit 1
        fi
      fi
      exec /entrypoint.sh /run.sh

    Requests:
      cpu:      250m
      memory:   256Mi
    Liveness:   exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      ZOO_DATA_LOG_DIR:
      ZOO_PORT_NUMBER:             2181
      ZOO_TICK_TIME:               2000
      ZOO_INIT_LIMIT:              10
      ZOO_SYNC_LIMIT:              5
      ZOO_MAX_CLIENT_CNXNS:        60
      ZOO_4LW_COMMANDS_WHITELIST:  srvr, mntr, ruok
      ZOO_LISTEN_ALLIPS_ENABLED:   no
      ZOO_AUTOPURGE_INTERVAL:      0
      ZOO_AUTOPURGE_RETAIN_COUNT:  3
      ZOO_MAX_SESSION_TIMEOUT:     40000
      ZOO_SERVERS:                 pinot-zookeeper-0.pinot-zookeeper-headless.pinot-quickstart.svc.cluster.local:2888:3888::1
      ZOO_ENABLE_AUTH:             no
      ZOO_HEAP_SIZE:               1024
      ZOO_LOG_LEVEL:               ERROR
      ALLOW_ANONYMOUS_LOGIN:       yes
      POD_NAME:                    pinot-zookeeper-0 (v1:metadata.name)
    Mounts:
      /bitnami/zookeeper from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fr8kw (ro)
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-pinot-zookeeper-0
    ReadOnly:   false
  kube-api-access-fr8kw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:                      <none>
but the
broker
still crashing because of fail to connect to ZK and ZK is in pending state
and I do
Copy code
helm install pinot pinot/pinot \
    -n pinot-quickstart \
    --values values.yaml
Ok, I think I solved the issue. the problem was that k8n cluster, by default did not have the required volume controller
x

Xiang Fu

09/29/2022, 4:51 PM
glad you solved it. Is this a AWS EKS cluster ? Technically all those things are built-in.
m

Mayank

09/29/2022, 4:52 PM
@Xiaobing let’s document the learnings in the docs page, so it helps the next person
w

Wojciech Wasik

09/29/2022, 4:53 PM
Yes, it’s AWS EKS, I used exactly this command
Copy code
EKS_CLUSTER_NAME=pinot-quickstart
eksctl create cluster \
--name ${EKS_CLUSTER_NAME} \
--version 1.16 \
--region us-west-2 \
--nodegroup-name standard-workers \
--node-type t3.xlarge \
--nodes 1 \
--nodes-min 1 \
--nodes-max 1
actually, one change I updated the version to
1.23
1.16 is not available anymore
x

Xiang Fu

09/29/2022, 6:20 PM
got it, we will update the script as well and see if the volume controller issue we can solve
r

reallyonthemove tous

10/14/2022, 9:04 PM
looks like i am hitting the same issue "binding volumes: timed out waiting for the condition", I have edited the values.yaml file to change the storageClass: "gp2" and set resources: {}. k8 cluster created with: eksctl create cluster --name ${EKS_CLUSTER_NAME} --version 1.23 --region us-east-1 --nodegroup-name standard-workers --node-type t3.xlarge --nodes 1 --nodes-min 1 --nodes-max 1 pinot cluster installed with: helm install pinot pinot/pinot -n pinot-quickstart --values values.yaml @Xiang Fu what updates do we need to make to the script?