Hi! I'm trying to follow the <instructions> to de...
# general
d
Hi! I'm trying to follow the instructions to deploy quickstart on AWS but I'm hitting issues. The pods are crashing regularly. I added some reproduction instructions here. I'm getting
TLS handshake timeout
error when trying to output the logs. Any help would be appreciated.
x
i will take a look
d
Thanks!
I'm trying a different EKS Cluster create command to see if it works.
I've tried a few variations so far but I'm still getting restarts.
x
I think we may need to give more memory in order to start everything though
d
Ah, okay.
x
also limit the container size
d
I'm unfamiliar with AWS so I'm guessing my log error is actually just a permission issue.
x
so it won’t kill other pod for resources constraints
d
Copy code
2020/03/16 21:51:22.936 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:25.047 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:27.159 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:29.271 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:31.383 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:33.495 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:35.607 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:37.719 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:39.831 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:41.943 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:44.055 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
2020/03/16 21:51:46.167 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot-quickstart2.svc.cluster.local:2181)] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
<http://java.net|java.net>.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_242]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_242]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
When I start with EKS managed resources, I get a different error
x
hmm
kubectl get all -n pinot-quickstart
what’s the out put for this
is there a headless service for zookeeper ?
d
With the error that I just posted (starting up EKS with --managed), I have the following
Copy code
NAME                     READY   STATUS              RESTARTS   AGE
pod/pinot-broker-0       0/1     Running             1          84s
pod/pinot-broker-1       0/1     Running             1          84s
pod/pinot-controller-0   0/1     Error               1          84s
pod/pinot-controller-1   0/1     Error               1          84s
pod/pinot-server-0       1/1     Running             0          85s
pod/pinot-server-1       1/1     Running             1          85s
pod/pinot-zookeeper-0    1/1     Running             0          84s
pod/pinot-zookeeper-1    0/1     ContainerCreating   0          12s

NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/pinot-broker                ClusterIP   10.100.37.52     <none>        8099/TCP                     85s
service/pinot-broker-headless       ClusterIP   None             <none>        8099/TCP                     85s
service/pinot-controller            ClusterIP   10.100.221.110   <none>        9000/TCP                     85s
service/pinot-controller-headless   ClusterIP   None             <none>        9000/TCP                     85s
service/pinot-server                ClusterIP   10.100.151.88    <none>        8098/TCP                     85s
service/pinot-server-headless       ClusterIP   None             <none>        8098/TCP                     85s
service/pinot-zookeeper             ClusterIP   10.100.200.144   <none>        2181/TCP                     85s
service/pinot-zookeeper-headless    ClusterIP   None             <none>        2181/TCP,3888/TCP,2888/TCP   85s

NAME                                READY   AGE
statefulset.apps/pinot-broker       0/2     85s
statefulset.apps/pinot-controller   0/2     85s
statefulset.apps/pinot-server       2/2     85s
statefulset.apps/pinot-zookeeper    1/3     85s
For the EKS instructions in Pinot instructions, I get the following
Copy code
NAME                     READY   STATUS             RESTARTS   AGE
pod/pinot-broker-0       0/1     Running            2          6m14s
pod/pinot-broker-1       0/1     Running            2          6m14s
pod/pinot-controller-0   1/1     Running            2          6m14s
pod/pinot-controller-1   0/1     CrashLoopBackOff   2          6m14s
pod/pinot-server-0       0/1     CrashLoopBackOff   4          6m15s
pod/pinot-server-1       1/1     Running            2          6m15s
pod/pinot-zookeeper-0    1/1     Running            0          6m14s
pod/pinot-zookeeper-1    1/1     Running            0          5m12s
pod/pinot-zookeeper-2    0/1     Running            0          4m25s

NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/pinot-broker                ClusterIP   10.100.171.216   <none>        8099/TCP                     6m15s
service/pinot-broker-headless       ClusterIP   None             <none>        8099/TCP                     6m15s
service/pinot-controller            ClusterIP   10.100.28.10     <none>        9000/TCP                     6m15s
service/pinot-controller-headless   ClusterIP   None             <none>        9000/TCP                     6m15s
service/pinot-server                ClusterIP   10.100.13.238    <none>        8098/TCP                     6m15s
service/pinot-server-headless       ClusterIP   None             <none>        8098/TCP                     6m15s
service/pinot-zookeeper             ClusterIP   10.100.58.200    <none>        2181/TCP                     6m15s
service/pinot-zookeeper-headless    ClusterIP   None             <none>        2181/TCP,3888/TCP,2888/TCP   6m15s

NAME                                READY   AGE
statefulset.apps/pinot-broker       0/2     6m15s
statefulset.apps/pinot-controller   0/2     6m15s
statefulset.apps/pinot-server       0/2     6m15s
statefulset.apps/pinot-zookeeper    1/3     6m15s
x
hmm
I think the reason is that those pods are deployed at same time
however pinot-zookeeper should be uprunning first
you can try to change the replica of pinot-zookeeper to 1
in order to bring up things faster
d
I'll try that.
x
Copy code
diff --git a/kubernetes/helm/values.yaml b/kubernetes/helm/values.yaml
index e25e8f353..85b140b56 100644
--- a/kubernetes/helm/values.yaml
+++ b/kubernetes/helm/values.yaml
@@ -30,7 +30,7 @@ cluster:
 controller:
   name: controller
   port: 9000
-  replicaCount: 2
+  replicaCount: 1

   persistence:
     enabled: true
@@ -85,7 +85,7 @@ broker:

   port: 8099

-  replicaCount: 2
+  replicaCount: 1

   jvmOpts: "-Xms4G -Xmx4G"

@@ -130,7 +130,7 @@ server:
     netty: 8098
     admin: 8097

-  replicaCount: 2
+  replicaCount: 1

   dataDir: /var/pinot/server/data/index
   segmentTarDir: /var/pinot/server/data/segment
@@ -182,7 +182,16 @@ zookeeper:

   ## Configure Zookeeper resource requests and limits
   ## ref: <http://kubernetes.io/docs/user-guide/compute-resources/>
-  resources: ~
+  resources:
+    requests:
+      memory: "1Gi"
+      cpu: "250m"
+    limits:
+      memory: "1Gi"
+      cpu: "500m"
+
+  ## Replicas
+  replicaCount: 1

   ## Environmental variables to set in Zookeeper
   env:
(END)
Copy code
➜ kubectl get all -n pinot-quickstart
NAME                     READY   STATUS    RESTARTS   AGE
pod/pinot-broker-0       1/1     Running   1          104s
pod/pinot-controller-0   1/1     Running   0          104s
pod/pinot-server-0       1/1     Running   1          104s
pod/pinot-zookeeper-0    1/1     Running   0          104s

NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/pinot-broker                ClusterIP   10.100.155.139   <none>        8099/TCP                     104s
service/pinot-broker-headless       ClusterIP   None             <none>        8099/TCP                     104s
service/pinot-controller            ClusterIP   10.100.82.228    <none>        9000/TCP                     104s
service/pinot-controller-headless   ClusterIP   None             <none>        9000/TCP                     104s
service/pinot-server                ClusterIP   10.100.166.133   <none>        8098/TCP                     104s
service/pinot-server-headless       ClusterIP   None             <none>        8098/TCP                     104s
service/pinot-zookeeper             ClusterIP   10.100.18.40     <none>        2181/TCP                     104s
service/pinot-zookeeper-headless    ClusterIP   None             <none>        2181/TCP,3888/TCP,2888/TCP   104s

NAME                                READY   AGE
statefulset.apps/pinot-broker       1/1     104s
statefulset.apps/pinot-controller   1/1     104s
statefulset.apps/pinot-server       1/1     104s
statefulset.apps/pinot-zookeeper    1/1     104s
d
Yea, I did that
x
ideally it’s better to also set resources for pinot components
d
I got it working used --managed instead of specifying my own ec2 resources
x
so it won’t get restarted when vm resources got tight
d
I'll have to go back and see if the original eks cluster create works
x
cool
got it
i’m on eks
d
Sweet. Okay, I'm unblocked. Thank you!
x
👍
d
When I use the
--managed
version, it works. When I use the
create cluster
command from the docs, it fails.
Copy code
eksctl create cluster --name=pinot-quickstart --nodes=1 --managed --alb-ingress-access --region=${AWS_REGION}
eksctl create cluster --name ${EKS_CLUSTER_NAME} --version 1.14 --region ca-central-1 --nodegroup-name standard-workers --node-type t3.small --nodes 1 --nodes-min 1 --nodes-max 1 --node-ami auto
If I do the one from the docs, pinot-broker-0 never gets READY
x
I think it might because of resource constraint?
do you have logs for pinot-broker?
is pinot-zookeeper and pinot-controller up?
d
No, I kept getting the TLS issue when I tried to get them
x
hmm, then could you try to get 2 nodes?
d
If I add
--managed
to the command from the docs, I get a different issue. Zookeeper stays in Pending.
Sure.
x
my test setup is using 3 nodes
d
Ah, okay.
x
3 t3.small
d
It might be a bit. I have to delete some old EKS Clusters.
x
ic
maybe you can create a new cluster
d
With 2 or 3 nodes, some of the stateful sets never become ready.
Ah, the statefulset in the 3 node version became ready around the 6 minute mark.
x
ic
i feel it might be the dependency
e.g. kafka won’t come up untill all 3 kafka-zookeepers are up
however zk-1 will wait for zk-0 to ready and up then start
so it may take a while for everything to startup and stablize
d
Weird. I tried again with 3 nodes on the smalls and I get a OOMKilled on broker
Copy code
% kubectl get all -n pinot-quickstart12
NAME                     READY   STATUS             RESTARTS   AGE
pod/pinot-broker-0       0/1     CrashLoopBackOff   4          5m4s
pod/pinot-controller-0   0/1     OOMKilled          4          5m4s
pod/pinot-server-0       0/1     Pending            0          32s
pod/pinot-zookeeper-0    1/1     Running            0          5m4s

NAME                                TYPE           CLUSTER-IP       EXTERNAL-IP                                                                 PORT(S)                      AGE
service/pinot-broker                ClusterIP      10.100.49.189    <none>                                                                      8099/TCP                     5m4s
service/pinot-broker-headless       ClusterIP      None             <none>                                                                      8099/TCP                     5m4s
service/pinot-controller            ClusterIP      10.100.210.114   <none>                                                                      9000/TCP                     5m4s
service/pinot-controller-external   LoadBalancer   10.100.248.135   <http://a0dfc01fc681a11ea9d8d028104cd1fd-183295560.ca-central-1.elb.amazonaws.com|a0dfc01fc681a11ea9d8d028104cd1fd-183295560.ca-central-1.elb.amazonaws.com>   9000:31278/TCP               5m4s
service/pinot-controller-headless   ClusterIP      None             <none>                                                                      9000/TCP                     5m4s
service/pinot-server                ClusterIP      10.100.20.168    <none>                                                                      8098/TCP                     5m4s
service/pinot-server-headless       ClusterIP      None             <none>                                                                      8098/TCP                     5m4s
service/pinot-zookeeper             ClusterIP      10.100.99.188    <none>                                                                      2181/TCP                     5m4s
service/pinot-zookeeper-headless    ClusterIP      None             <none>                                                                      2181/TCP,3888/TCP,2888/TCP   5m4s

NAME                                READY   AGE
statefulset.apps/pinot-broker       0/1     5m5s
statefulset.apps/pinot-controller   0/1     5m5s
statefulset.apps/pinot-server       0/1     5m5s
statefulset.apps/pinot-zookeeper    1/1     5m5s
pinot-controller-0
is having issues talking to zk
pinot-zookeeper-0
went from ready to pending without any restarts.
x
hmmm, do you set resources for pinot-controller/broker/server?
also you can change the jvm_opts
the init jvm xms is 4g
you can remove that also
d
I didn't change anything else. I can try changing jvm_opts. Last night, I changed to t3.medium and everything worked.
Sweet. Yea, lowering jvm_opts makes this work much better.
That'll be fine for what I'm doing.
x
cool!
d
I'm trying to follow the instructions (I replaced some variables though) to create the realtime tables. I'm getting an issue saying I can't find controller:9000
Copy code
2020/03/17 18:41:59.728 INFO [AddTableCommand] [main] Executing command: AddTable -tableConfigFile /var/pinot/events/events_realtime_table_config.json -schemaFile /var/pinot/events/events_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec2020/03/17 18:42:05.481 ERROR [PinotAdministrator] [main] Exception caught: <http://java.io|java.io>.IOException: Server returned HTTP response code: 500 for URL: <http://pinot-controller:9000/tables>	at <http://sun.net|sun.net>.<http://www.protocol.http.HttpURLConnection.getInputStream0|www.protocol.http.HttpURLConnection.getInputStream0>(HttpURLConnection.java:1900) ~[?:1.8.0_242]	at <http://sun.net|sun.net>.<http://www.protocol.http.HttpURLConnection.getInputStream|www.protocol.http.HttpURLConnection.getInputStream>(HttpURLConnection.java:1498) ~[?:1.8.0_242]	at org.apache.pinot.tools.admin.command.AbstractBaseAdminCommand.sendPostRequest(AbstractBaseAdminCommand.java:78) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]	at org.apache.pinot.tools.admin.command.AddTableCommand.sendTableCreationRequest(AddTableCommand.java:138) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]	at org.apache.pinot.tools.admin.command.AddTableCommand.execute(AddTableCommand.java:163) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:154) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:166) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
Copy code
% kubectl get all -n pinot-events-dev                                                   
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/kafka-0                                            1/1     Running   0          11m
pod/kafka-1                                            1/1     Running   0          10m
pod/kafka-2                                            1/1     Running   0          9m29s
pod/kafka-zookeeper-0                                  1/1     Running   0          11m
pod/kafka-zookeeper-1                                  1/1     Running   0          11m
pod/kafka-zookeeper-2                                  1/1     Running   0          10m
pod/pinot-broker-0                                     1/1     Running   0          45m
pod/pinot-controller-0                                 1/1     Running   0          45m
pod/pinot-realtime-events-pinot-table-creation-x4fc8   1/1     Running   2          32s
pod/pinot-server-0                                     1/1     Running   0          45m
pod/pinot-zookeeper-0                                  1/1     Running   0          45m

NAME                                TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/kafka                       ClusterIP      10.100.223.82    <none>        9092/TCP                     11m
service/kafka-headless              ClusterIP      None             <none>        9092/TCP                     11m
service/kafka-zookeeper             ClusterIP      10.100.12.136    <none>        2181/TCP                     11m
service/kafka-zookeeper-headless    ClusterIP      None             <none>        2181/TCP,3888/TCP,2888/TCP   11m
service/pinot-broker                ClusterIP      10.100.195.162   <none>        8099/TCP                     45m
service/pinot-broker-headless       ClusterIP      None             <none>        8099/TCP                     45m
service/pinot-controller            ClusterIP      10.100.110.231   <none>        9000/TCP                     45m
service/pinot-controller-external   LoadBalancer   10.100.52.128    <pending>     9000:30946/TCP               45m
service/pinot-controller-headless   ClusterIP      None             <none>        9000/TCP                     45m
service/pinot-server                ClusterIP      10.100.50.20     <none>        8098/TCP                     45m
service/pinot-server-headless       ClusterIP      None             <none>        8098/TCP                     45m
service/pinot-zookeeper             ClusterIP      10.100.86.240    <none>        2181/TCP                     45m
service/pinot-zookeeper-headless    ClusterIP      None             <none>        2181/TCP,3888/TCP,2888/TCP   45m

NAME                                READY   AGE
statefulset.apps/kafka              3/3     11m
statefulset.apps/kafka-zookeeper    3/3     11m
statefulset.apps/pinot-broker       1/1     45m
statefulset.apps/pinot-controller   1/1     45m
statefulset.apps/pinot-server       1/1     45m
statefulset.apps/pinot-zookeeper    1/1     45m

NAME                                                   COMPLETIONS   DURATION   AGE
job.batch/pinot-realtime-events-pinot-table-creation   0/1           32s        32s
Is this an issue with running with external.enabled = true?
x
shouldn’t be
that should just expose an external ip so everyone could access from it
d
If I run the ./query-pinot-data, I get access to the UI.
Maybe I have something misconfigured for DNS resolution? I figured other things would be broken too.
x
yes, that is though local port-forwarding
oh?
d
If you've seen the issue before, please let me know. I'll try to see if I can get this working just from the tutorial commands (no modifications on my side).
x
ok
what did you change from the table config?
d
I did a few changes. I copy/pasted in the one I'm using for local development. I can upload it if it helps.
x
sure, or you can check pinot-controller log to see if the request hit the controller
d
Ok, it's something with my setup. The tutorial code worked. I'll iterate on mine.
x
cool
thanks for you update!
d
Copy code
2020/03/17 19:32:16.021 WARN [PartitionCountFetcher] [grizzly-http-server-1] Could not get partition count for topic events-realtime
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2020/03/17 19:32:16.021 ERROR [PinotTableIdealStateBuilder] [grizzly-http-server-1] Could not get partition count for events-realtime
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2020/03/17 19:32:16.021 ERROR [PinotTableRestletResource] [grizzly-http-server-1] org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
java.lang.RuntimeException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
	at org.apache.pinot.controller.helix.core.PinotTableIdealStateBuilder.getPartitionCount(PinotTableIdealStateBuilder.java:125) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.getNumPartitions(PinotLLCRealtimeSegmentManager.java:575) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.setUpNewTable(PinotLLCRealtimeSegmentManager.java:212) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.apache.pinot.controller.helix.core.PinotTableIdealStateBuilder.buildLowLevelRealtimeIdealStateFor(PinotTableIdealStateBuilder.java:114) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.ensureRealtimeClusterIsSetUp(PinotHelixResourceManager.java:1248) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.addTable(PinotHelixResourceManager.java:1127) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.apache.pinot.controller.api.resources.PinotTableRestletResource.addTable(PinotTableRestletResource.java:122) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_242]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:679) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:353) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) ~[pinot-all-0.3.0-SNAPSHOT-jar-with-dependencies.jar:0.3.0-SNAPSHOT-183d810494f0ac87f1ac14f85bb8e04dca3e99ec]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
My controller has this in the logs
Ah, I probably have a typo.
I'm confused why the logs are saying the topic's name is
metadata
My stream config is
Copy code
"streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "simple",
          "stream.kafka.topic.name": "events-realtime",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.hlc.zk.connect.string": "kafka-zookeeper:2181",
          "stream.kafka.zk.broker.url": "kafka-zookeeper:2181",
          "stream.kafka.broker.list": "localhost:9092",
          "realtime.segment.flush.threshold.time": "3600000",
          "realtime.segment.flush.threshold.size": "50000",
          "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
        }
There are a bunch of errors in the logs around zk and kafka. Probably related.
I'm going to create a new EKS and use my configs to see if there is an issue with my EKS cluster.
x
seems like a zookeeper issue
Copy code
Could not get partition count for events-realtime
this means it fails to read from kafka
d
Gotcha
I have to step out for a couple hours. I can reproduce this issue on a separate isolate eks cluster. I'll iterate through to figure out the issue.
x
ok, just wanna confirm if realtime table could consume from kafka ?
d
I don't know if it's consuming from kafka. From what I remember, The create table job succeeded but the load job failed. When I went to the Pinot UI, there was some data populated. The pinot-quickstart got further than my events version.
x
if it’s streaming table ,then it means pinot could read from kafka
d
I figured it out. The streaming config I copied over had wrong kafka broker.
Copy code
<           "stream.kafka.broker.list": "kafka:9092",
---
>           "stream.kafka.broker.list": "localhost:9092",
x
🙂
d
Only weird issue I've had is that the first time I started Kafka and added a table, Pinot didn't pick up the events from Kafka. When I restarted Pinot, it picked it up.
x
do you mean after you started kafka and pushing data into Kafka, then create the table?
if so, you could try to get the controller log for add table
it’s possible that controller is not connect to kafka
actually do you restart all pinot components or just pinot server
d
I started (1) Pinot (2) then Kafka (3) then created the realtime tables.
If I hit it again, I'll look at the log
I think I just restarted all of the pinot components.
x
got it
technically we should dump data into kafka first
then create pinot table