Luis Fernandez
04/18/2022, 8:09 PM- name: ZK_PURGE_INTERVAL
value: "1"
- name: ZK_SNAP_RETAIN_COUNT
value: "3"
in the logs i can see things getting set:
2022-04-15 16:14:35,914 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2022-04-15 16:14:35,915 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 1
2022-04-15 16:14:35,979 [myid:1] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2022-04-15 16:14:35,980 [myid:1] - INFO [main:ManagedUtil@46] - Log4j found with jmx enabled.
2022-04-15 16:14:35,988 [myid:1] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
however i don’t see any more logs of clean up logs after, is there a reason for this? also, i can see that the space is being chugged by the logs does anyone know why things may not get cleaned up? Thank you would appreciate your help.Mayank
Luis Fernandez
04/18/2022, 8:24 PM/data/log/version-2
Luis Fernandez
04/18/2022, 8:24 PMLuis Fernandez
04/18/2022, 8:24 PM$ ls -lh
total 3.2G
-rw-rw-r-- 1 zookeeper zookeeper 768M Apr 5 00:45 log.700000001
-rw-rw-r-- 1 zookeeper zookeeper 1.0G Apr 11 17:51 log.700014b0e
-rw-rw-r-- 1 zookeeper zookeeper 1.1G Apr 15 16:01 log.8000001e6
-rw-r--r-- 1 zookeeper zookeeper 448M Apr 18 20:11 log.80000ea05
Xiaoman Dong
04/18/2022, 8:25 PMMayank
Luis Fernandez
04/18/2022, 8:31 PM/data/log/version-2
Luis Fernandez
04/18/2022, 8:31 PMMayank
Luis Fernandez
04/18/2022, 8:32 PMLuis Fernandez
04/18/2022, 8:32 PM`ZKLG
?5
"id" : "WorkflowContext",
"simpleFields" : {
"LAST_PURGE_TIME" : "1646699352774",
"NAME" : "TaskQueue_RealtimeToOfflineSegmentsTask",
"START_TIME" : "1646234218213",
"STATE" : "IN_PROGRESS"
},
"mapFields" : {
"JOB_STATES" : {
"TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1646698682995" : "COMPLETED"
},
"StartTime" : {
"TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1646698682995" : "1646698697926"
}
},
"listFields" : { }
}
"id" : "etsyads_metrics_dev__0__5__20220307T2037Z",
"simpleFields" : {
Mayank
Luis Fernandez
04/18/2022, 8:32 PMLuis Fernandez
04/18/2022, 8:32 PMMayank
Xiaoman Dong
04/18/2022, 8:33 PMMayank
Luis Fernandez
04/18/2022, 8:33 PM/data/version-2
Luis Fernandez
04/18/2022, 8:34 PMtotal 3.5M
-rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 acceptedEpoch
-rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 currentEpoch
-rw-rw-r-- 1 zookeeper zookeeper 481K Mar 8 19:58 snapshot.500a20f00
-rw-rw-r-- 1 zookeeper zookeeper 1.3M Apr 5 00:45 snapshot.700014b0d
-rw-rw-r-- 1 zookeeper zookeeper 1.8M Apr 11 17:53 snapshot.8000001e5
Luis Fernandez
04/18/2022, 8:34 PMLuis Fernandez
04/18/2022, 8:35 PMapache-zookeeper-3.5.5
Mayank
Luis Fernandez
04/18/2022, 8:36 PMDaniel Lavoie
04/18/2022, 8:37 PMDaniel Lavoie
04/18/2022, 8:37 PMDaniel Lavoie
04/18/2022, 8:37 PM/conf/zoo.cfg
from within the ZK container?Luis Fernandez
04/18/2022, 8:37 PMLuis Fernandez
04/18/2022, 8:38 PMclientPort=2181
dataDir=/data
dataLogDir=/data/log
tickTime=2000
initLimit=10
syncLimit=10
maxClientCnxns=60
minSessionTimeout= 4000
maxSessionTimeout= 40000
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
4lw.commands.whitelist=*
server.1=pinot-zookeeper-0.pinot-zookeeper-headless.pinot-dev.svc.cluster.local:2888:3888
server.2=pinot-zookeeper-1.pinot-zookeeper-headless.pinot-dev.svc.cluster.local:2888:3888
server.3=pinot-zookeeper-2.pinot-zookeeper-headless.pinot-dev.svc.cluster.local:2888:3888
Daniel Lavoie
04/18/2022, 8:41 PMLuis Fernandez
04/18/2022, 8:43 PMDaniel Lavoie
04/18/2022, 8:43 PMDaniel Lavoie
04/18/2022, 8:43 PMLuis Fernandez
04/18/2022, 8:43 PMLuis Fernandez
04/18/2022, 8:43 PMLuis Fernandez
04/18/2022, 8:44 PMZK_DATA_DIR=${ZK_DATA_DIR:-"/data"}
ZK_DATA_LOG_DIR=${ZK_DATA_LOG_DIR:-"/data/log"}
Luis Fernandez
04/18/2022, 8:45 PMDaniel Lavoie
04/18/2022, 8:46 PMDaniel Lavoie
04/18/2022, 8:46 PMDaniel Lavoie
04/18/2022, 8:46 PMDaniel Lavoie
04/18/2022, 8:46 PMDaniel Lavoie
04/18/2022, 8:47 PMLuis Fernandez
04/18/2022, 8:47 PMDaniel Lavoie
04/18/2022, 8:48 PMDaniel Lavoie
04/18/2022, 8:48 PMDaniel Lavoie
04/18/2022, 8:49 PMLuis Fernandez
04/18/2022, 8:50 PMLuis Fernandez
04/18/2022, 8:52 PM/data/log/version-2
total 3.2G
-rw-rw-r-- 1 zookeeper zookeeper 768M Apr 5 00:45 log.700000001
-rw-rw-r-- 1 zookeeper zookeeper 1.0G Apr 11 17:51 log.700014b0e
-rw-rw-r-- 1 zookeeper zookeeper 1.1G Apr 15 16:01 log.8000001e6
-rw-r--r-- 1 zookeeper zookeeper 448M Apr 18 20:37 log.80000ea05
/data/version-2
total 3.5M
-rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 acceptedEpoch
-rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 currentEpoch
-rw-rw-r-- 1 zookeeper zookeeper 481K Mar 8 19:58 snapshot.500a20f00
-rw-rw-r-- 1 zookeeper zookeeper 1.3M Apr 5 00:45 snapshot.700014b0d
-rw-rw-r-- 1 zookeeper zookeeper 1.8M Apr 11 17:53 snapshot.8000001e5
Daniel Lavoie
04/18/2022, 8:53 PMDaniel Lavoie
04/18/2022, 8:54 PMDaniel Lavoie
04/18/2022, 8:54 PMLuis Fernandez
04/18/2022, 8:54 PMMayank
Mayank
Harish Bohara
04/19/2022, 2:22 PMLuis Fernandez
04/19/2022, 2:23 PMDaniel Lavoie
04/19/2022, 2:26 PMDaniel Lavoie
04/19/2022, 2:26 PMLuis Fernandez
04/19/2022, 2:26 PMHarish Bohara
04/19/2022, 2:27 PMLuis Fernandez
04/19/2022, 2:27 PMLuis Fernandez
04/19/2022, 2:27 PMenv:
- name: ZK_REPLICAS
value: "3"
- name: JMXAUTH
value: "false"
- name: JMXDISABLE
value: "false"
- name: JMXPORT
value: "1099"
- name: JMXSSL
value: "false"
- name: ZK_HEAP_SIZE
value: "256M"
- name: ZK_SYNC_LIMIT
value: "10"
- name: ZK_TICK_TIME
value: "2000"
- name: ZK_PURGE_INTERVAL
value: "1"
- name: ZK_SNAP_RETAIN_COUNT
value: "3"
- name: ZOO_INIT_LIMIT
value: "5"
- name: ZOO_MAX_CLIENT_CNXNS
value: "60"
- name: ZOO_PORT
value: "2181"
- name: ZOO_STANDALONE_ENABLED
value: "false"
- name: ZOO_TICK_TIME
value: "2000"
Luis Fernandez
04/19/2022, 2:28 PMresources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /config-scripts
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "5Gi"
Daniel Lavoie
04/19/2022, 2:36 PMZK_DATA_DIR
to /data/snapshots
should do the trick. Big big spoiler. Do not attempt this on an environment you cannot affort to lose data. I have reason to believe that this could cause problem if all pods are updated at once. I would recommend update your statefulset update policy to OnDelete
and set the env variable. Delete the pods one by one and checking the quorum status. Once the cluster is rollout, you can apply the changes with helm to ensure they are permanent.Luis Fernandez
04/19/2022, 2:39 PMupdateStrategy:
type: RollingUpdate
Luis Fernandez
04/19/2022, 2:39 PMOnDelete
?Daniel Lavoie
04/19/2022, 2:39 PMDaniel Lavoie
04/19/2022, 2:39 PMDaniel Lavoie
04/19/2022, 2:40 PMDaniel Lavoie
04/19/2022, 2:40 PMDaniel Lavoie
04/19/2022, 2:40 PMLuis Fernandez
04/19/2022, 2:40 PMLuis Fernandez
04/19/2022, 2:40 PMreadinessProbe:
exec:
command:
- sh
- /config-scripts/ready
Daniel Lavoie
04/19/2022, 2:41 PMDaniel Lavoie
04/19/2022, 2:43 PMOnDelete
you can control the rollout yourself.Luis Fernandez
04/19/2022, 2:44 PMOnDelete
on the updateStrategy, is that a no op change?Daniel Lavoie
04/19/2022, 2:44 PMDaniel Lavoie
04/19/2022, 2:44 PMLuis Fernandez
04/19/2022, 2:44 PMLuis Fernandez
04/19/2022, 2:45 PMLuis Fernandez
04/19/2022, 2:45 PMLuis Fernandez
04/19/2022, 2:45 PMDaniel Lavoie
04/19/2022, 2:46 PMDaniel Lavoie
04/19/2022, 2:47 PMLuis Fernandez
04/19/2022, 2:48 PMDaniel Lavoie
04/19/2022, 2:48 PMDaniel Lavoie
04/19/2022, 2:48 PMDaniel Lavoie
04/19/2022, 2:48 PMDaniel Lavoie
04/19/2022, 2:49 PM/data/version-2
manually.Luis Fernandez
04/19/2022, 2:49 PMDaniel Lavoie
04/19/2022, 2:50 PMDaniel Lavoie
04/19/2022, 2:50 PMHarish Bohara
04/19/2022, 2:52 PMlimits:
memory: 7G
requests:
cpu: 1000m
memory: 6G
env:
- name: ZOO_DATA_LOG_DIR
value: ""
- name: ZOO_PORT_NUMBER
value: "2181"
- name: ZOO_TICK_TIME
value: "2000"
- name: ZOO_INIT_LIMIT
value: "10"
- name: ZOO_SYNC_LIMIT
value: "5"
- name: ZOO_MAX_CLIENT_CNXNS
value: "60"
- name: ZOO_4LW_COMMANDS_WHITELIST
value: "srvr, mntr, ruok"
- name: ZOO_LISTEN_ALLIPS_ENABLED
value: "no"
- name: ZOO_AUTOPURGE_INTERVAL
value: "0"
- name: ZOO_AUTOPURGE_RETAIN_COUNT
value: "3"
- name: ZOO_MAX_SESSION_TIMEOUT
value: "40000"
- name: ZOO_SERVERS
value: zookeeper-0.zookeeper-headless.pinot.svc.cluster.local:2888:3888::1 zookeeper-1.zookeeper-headless.pinot.svc.cluster.local:2888:3888::2 zookeeper-2.zookeeper-headless.pinot.svc.cluster.local:2888:3888::3
- name: ZOO_ENABLE_AUTH
value: "no"
- name: ZOO_HEAP_SIZE
value: "1024"
- name: ZOO_LOG_LEVEL
value: "ERROR"
- name: ALLOW_ANONYMOUS_LOGIN
value: "yes"
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
ports:
- name: client
containerPort: 2181
- name: follower
containerPort: 2888
- name: election
containerPort: 3888
livenessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
readinessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 6
volumeMounts:
- name: data
mountPath: /bitnami/zookeeper
volumes:
volumeClaimTemplates:
- metadata:
name: data
annotations:
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "100Gi"
Luis Fernandez
04/19/2022, 2:56 PMDaniel Lavoie
04/19/2022, 2:57 PMDaniel Lavoie
04/19/2022, 2:57 PMLuis Fernandez
04/19/2022, 2:58 PMDaniel Lavoie
04/19/2022, 2:58 PMLuis Fernandez
04/19/2022, 3:03 PMLuis Fernandez
04/19/2022, 3:03 PMDaniel Lavoie
04/19/2022, 3:04 PMHarish Bohara
04/19/2022, 3:18 PMupdateStrategy:
type: RollingUpdate
To
updateStrategy:
type: OnDelete
Also add
env:
- name: ZK_DATA_DIR
value: "/bitnami/zookeeper/data/snapshots"
This will not restart ZK nodes
2. Delete first of the ZK node manually - wait for it to be synced
3. Delete second of the ZK node manually - wait for it to be synced
4. Delete third of the ZK node manually - wait for it to be synced
5. Rollback the changes in StatefulSet and change updateStrategy back to OnDelete
updateStrategy:
type: RollingUpdate
Daniel Lavoie
04/19/2022, 3:19 PMenv:
- name: ZK_DATA_DIR
value: "/data/snapshots"
Daniel Lavoie
04/19/2022, 3:22 PM/data/version-2
on all podsHarish Bohara
04/19/2022, 3:24 PMvolumeMounts:
- name: data
mountPath: /bitnami/zookeeper
I have no name!@zookeeper-0:/$ df -h
Filesystem Size Used Avail Use% Mounted on
overlay 50G 3.4G 47G 7% /
tmpfs 64M 0 64M 0% /dev
tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup
/dev/nvme0n1p1 50G 3.4G 47G 7% /etc/hosts
/dev/nvme1n1 99G 441M 98G 1% /bitnami/zookeeper
shm 64M 0 64M 0% /dev/shm
tmpfs 3.8G 12K 3.8G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 3.8G 0 3.8G 0% /proc/acpi
tmpfs 3.8G 0 3.8G 0% /sys/firmware
zookeeper-0:/bitnami/zookeeper/data$ ls -la
total 16
drwxrwsr-x 3 1001 1001 4096 Apr 19 07:58 .
drwxrwsr-x 4 root 1001 4096 Apr 19 07:58 ..
-rw-rw-r-- 1 1001 1001 2 Apr 19 07:58 myid
drwxrwsr-x 2 1001 1001 4096 Apr 19 15:08 version-2
Harish Bohara
04/19/2022, 3:24 PMDaniel Lavoie
04/19/2022, 3:25 PMHarish Bohara
04/19/2022, 3:25 PMHarish Bohara
04/19/2022, 3:26 PMDaniel Lavoie
04/19/2022, 3:27 PMDaniel Lavoie
04/19/2022, 3:27 PM/data
path.Harish Bohara
04/19/2022, 3:34 PMI think I should do 2 changes: my configs do now have both ZK_DATA_DIR and ZK_DATA_LOG_DIR.
env:
- name: ZK_DATA_DIR
value: /bitnami/zookeeper/data/snapshots
- name: ZK_DATA_LOG_DIR
value: /bitnami/zookeeper/data/log
Daniel Lavoie
04/19/2022, 3:35 PMZK_DATA_LOG_DIR
will default to /data/log/
Daniel Lavoie
04/19/2022, 3:36 PMLuis Fernandez
04/19/2022, 3:38 PMLuis Fernandez
04/19/2022, 3:39 PM/data/snapshot
instead of that so that by default future clients don’t run into itDaniel Lavoie
04/19/2022, 3:39 PMDaniel Lavoie
04/19/2022, 3:40 PMLuis Fernandez
04/19/2022, 3:46 PMDaniel Lavoie
04/19/2022, 3:46 PM/data/log
Luis Fernandez
04/19/2022, 3:46 PMDaniel Lavoie
04/19/2022, 3:47 PMDaniel Lavoie
04/19/2022, 3:47 PMDaniel Lavoie
04/19/2022, 3:47 PMDaniel Lavoie
04/19/2022, 3:47 PMDaniel Lavoie
04/19/2022, 3:48 PMHarish Bohara
04/19/2022, 3:49 PM$printenv
ZK_DATA_LOG_DIR=/bitnami/zookeeper/data/log
ZK_DATA_DIR=/bitnami/zookeeper/data/snapshots
...
I dont see /bitnami/zookeeper/data/snapshots or /bitnami/zookeeper/data/logs dir after restart..
Daniel Lavoie
04/19/2022, 3:49 PM/conf/zoo.cfg
config?Harish Bohara
04/19/2022, 3:50 PMI have no name!@zookeeper-2:/$ cat /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/bitnami/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# <http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance>
#
# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=0
## Metrics Providers
#
# <https://prometheus.io> Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true
preAllocSize=65536
snapCount=100000
maxCnxns=0
reconfigEnabled=false
quorumListenOnAllIPs=false
4lw.commands.whitelist=srvr, mntr, ruok
maxSessionTimeout=40000
server.1=zookeeper-0.zookeeper-headless.pinot.svc.cluster.local:2888:3888;2181
server.2=zookeeper-1.zookeeper-headless.pinot.svc.cluster.local:2888:3888;2181
server.3=zookeeper-2.zookeeper-headless.pinot.svc.cluster.local:2888:3888;2181
I have no name!@zookeeper-2:/$
Luis Fernandez
04/19/2022, 3:51 PMDaniel Lavoie
04/19/2022, 3:51 PMdataDir
Luis Fernandez
04/19/2022, 3:51 PMdataLogDir
Daniel Lavoie
04/19/2022, 3:52 PMHarish Bohara
04/19/2022, 3:53 PMDaniel Lavoie
04/19/2022, 3:54 PMHarish Bohara
04/19/2022, 4:17 PMDaniel Lavoie
04/19/2022, 4:17 PM/data
dir causes the issueDaniel Lavoie
04/19/2022, 4:18 PMDaniel Lavoie
04/19/2022, 4:18 PMHarish Bohara
04/19/2022, 4:21 PMLuis Fernandez
04/19/2022, 4:23 PMMayank
Mayank
Daniel Lavoie
04/19/2022, 5:20 PMLuis Fernandez
04/19/2022, 5:22 PMLuis Fernandez
04/19/2022, 5:22 PMLuis Fernandez
04/19/2022, 5:45 PMDaniel Lavoie
04/19/2022, 5:45 PMzkCli.sh ls
command on itDaniel Lavoie
04/19/2022, 5:46 PMLuis Fernandez
04/19/2022, 5:48 PMLuis Fernandez
04/19/2022, 5:48 PM2022-04-19 17:47:13,503 [myid:3] - ERROR [main:QuorumPeer@955] - Unable to load database on disk
java.io.IOException: No snapshot found, but there are log entries. Something is broken!
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:919)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:905)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
2022-04-19 17:47:13,505 [myid:3] - ERROR [main:QuorumPeerMain@101] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:956)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:905)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
Caused by: java.io.IOException: No snapshot found, but there are log entries. Something is broken!
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:919)
Daniel Lavoie
04/19/2022, 5:49 PMDaniel Lavoie
04/19/2022, 5:49 PMLuis Fernandez
04/19/2022, 5:50 PMDaniel Lavoie
04/19/2022, 5:50 PMkubectl delete pod <pod-name> && kubectl delete pvc <pvc-name>
Daniel Lavoie
04/19/2022, 5:51 PMLuis Fernandez
04/19/2022, 5:57 PMzkCli.sh ls
within the box?Luis Fernandez
04/19/2022, 5:57 PMDaniel Lavoie
04/19/2022, 5:57 PMLuis Fernandez
04/19/2022, 5:57 PMclientPort=2181
dataDir=/data/snapshot
dataLogDir=/data/log
tickTime=2000
initLimit=10
syncLimit=10
maxClientCnxns=60
minSessionTimeout= 4000
maxSessionTimeout= 40000
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
4lw.commands.whitelist=*
server.1=pinot-zookeeper-0.pinot-zookeeper-headless.pinot-dev.svc.cluster.l
ocal:2888:3888
server.2=pinot-zookeeper-1.pinot-zookeeper-headless.pinot-dev.svc.cluster.l
ocal:2888:3888
server.3=pinot-zookeeper-2.pinot-zookeeper-headless.pinot-dev.svc.cluster.l
ocal:2888:3888
Luis Fernandez
04/19/2022, 5:58 PMdata/log
?Daniel Lavoie
04/19/2022, 5:59 PMzkCli.sh ls
show anything?Luis Fernandez
04/19/2022, 6:00 PMConnecting to localhost:2181
2022-04-19 17:59:27,717 [myid:] - INFO [main:Environment@109] - Client environment:zookeeper.version=3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built on 05/03/2019 12:07 GMT
2022-04-19 17:59:27,720 [myid:] - INFO [main:Environment@109] - Client environment:host.name=pinot-zookeeper-2.pinot-zookeeper-headless.pinot-dev.svc.cluster.local
2022-04-19 17:59:27,720 [myid:] - INFO [main:Environment@109] - Client environment:java.version=1.8.0_232
2022-04-19 17:59:27,721 [myid:] - INFO [main:Environment@109] - Client environment:java.vendor=Oracle Corporation
2022-04-19 17:59:27,721 [myid:] - INFO [main:Environment@109] - Client environment:java.home=/usr/local/openjdk-8
2022-04-19 17:59:27,721 [myid:] - INFO [main:Environment@109] - Client environment:java.class.path=/apache-zookeeper-3.5.5-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.5.5-bin/bin/../build/classes:/apache-zookeeper-3.5.5-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.5.5-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/zookeeper-jute-3.5.5.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/zookeeper-3.5.5.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/slf4j-log4j12-1.7.25.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/slf4j-api-1.7.25.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/netty-all-4.1.29.Final.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/json-simple-1.1.1.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jline-2.11.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-util-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-servlet-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-server-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-security-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-io-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-http-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jackson-databind-2.9.8.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jackson-core-2.9.8.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jackson-annotations-2.9.0.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/commons-cli-1.2.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/audience-annotations-0.5.0.jar:/apache-zookeeper-3.5.5-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.5.5-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/conf:
2022-04-19 17:59:27,724 [myid:] - INFO [main:Environment@109] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2022-04-19 17:59:27,724 [myid:] - INFO [main:Environment@109] - Client environment:java.io.tmpdir=/tmp
2022-04-19 17:59:27,724 [myid:] - INFO [main:Environment@109] - Client environment:java.compiler=<NA>
2022-04-19 17:59:27,724 [myid:] - INFO [main:Environment@109] - Client environment:os.name=Linux
2022-04-19 17:59:27,724 [myid:] - INFO [main:Environment@109] - Client environment:os.arch=amd64
2022-04-19 17:59:27,724 [myid:] - INFO [main:Environment@109] - Client environment:os.version=5.4.144+
2022-04-19 17:59:27,724 [myid:] - INFO [main:Environment@109] - Client environment:user.name=zookeeper
2022-04-19 17:59:27,725 [myid:] - INFO [main:Environment@109] - Client environment:user.home=/home/zookeeper
2022-04-19 17:59:27,725 [myid:] - INFO [main:Environment@109] - Client environment:user.dir=/apache-zookeeper-3.5.5-bin
2022-04-19 17:59:27,725 [myid:] - INFO [main:Environment@109] - Client environment:os.memory.free=11MB
2022-04-19 17:59:27,726 [myid:] - INFO [main:Environment@109] - Client environment:os.memory.max=247MB
2022-04-19 17:59:27,726 [myid:] - INFO [main:Environment@109] - Client environment:os.memory.total=15MB
2022-04-19 17:59:27,728 [myid:] - INFO [main:ZooKeeper@868] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@3b95a09c
2022-04-19 17:59:27,731 [myid:] - INFO [main:X509Util@79] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2022-04-19 17:59:27,800 [myid:] - INFO [main:ClientCnxnSocket@237] - jute.maxbuffer value is 4194304 Bytes
2022-04-19 17:59:27,806 [myid:] - INFO [main:ClientCnxn@1653] - zookeeper.request.timeout value is 0. feature enabled=
ls [-s] [-w] [-R] path
command terminated with exit code 1
Daniel Lavoie
04/19/2022, 6:00 PMzkCli.sh ls /
Luis Fernandez
04/19/2022, 6:01 PM[pinot, zookeeper]
Luis Fernandez
04/19/2022, 6:02 PMDaniel Lavoie
04/19/2022, 6:02 PMDaniel Lavoie
04/19/2022, 6:02 PMLuis Fernandez
04/19/2022, 6:03 PM2022-04-19 17:54:15,446 [myid:3] - WARN [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):Learner@282] - Unexpected exception, tries=2, remaining init limit=17995, connecting to pinot-zookeeper-0.pinot-zookeeper-headless.pinot-dev.svc.cluster.local/10.12.177.117:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:233)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:262)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:77)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2022-04-19 17:54:16,448 [myid:3] - WARN [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):Learner@282] - Unexpected exception, tries=3, remaining init limit=16993, connecting to pinot-zookeeper-0.pinot-zookeeper-headless.pinot-dev.svc.cluster.local/10.12.177.117:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:233)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:262)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:77)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2022-04-19 17:54:17,450 [myid:3] - ERROR [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):Learner@277] - Unexpected exception, retries exceeded. tries=4, remaining init limit=15991, connecting to pinot-zookeeper-0.pinot-zookeeper-headless.pinot-dev.svc.cluster.local/10.12.177.117:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:233)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:262)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:77)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2022-04-19 17:54:17,450 [myid:3] - WARN [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):Follower@96] - Exception when following the leader
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:233)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:262)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:77)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2022-04-19 17:54:17,450 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):Follower@201] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
2022-04-19 17:54:17,451 [myid:3] - WARN [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1318] - PeerState set to LOOKING
2022-04-19 17:54:17,451 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1193] - LOOKING
2022-04-19 17:54:17,451 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@885] - New election. My id = 3, proposed zxid=0x0
2022-04-19 17:54:17,452 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2022-04-19 17:54:17,453 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0xa00000018 (n.zxid), 0x1 (n.round), FOLLOWING (n.state), 1 (n.sid), 0xb (n.peerEPoch), LOOKING (my state)0 (n.config version)
2022-04-19 17:54:17,454 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0xa00000018 (n.zxid), 0x1 (n.round), LEADING (n.state), 2 (n.sid), 0xb (n.peerEPoch), LOOKING (my state)0 (n.config version)
2022-04-19 17:54:17,454 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1269] - FOLLOWING
Luis Fernandez
04/19/2022, 6:04 PM+ /config-scripts/run
+ exec java -cp '/apache-zookeeper-3.5.5-bin/lib/*:/apache-zookeeper-3.5.5-bin/*jar:/conf:' -Xmx256M -Xms256M org.apache.zookeeper.server.quorum.QuorumPeerMain /conf/zoo.cfg
2022-04-19 17:54:12,120 [myid:] - INFO [main:QuorumPeerConfig@133] - Reading configuration from: /conf/zoo.cfg
2022-04-19 17:54:12,125 [myid:] - INFO [main:QuorumPeerConfig@385] - clientPortAddress is 0.0.0.0/0.0.0.0:2181
2022-04-19 17:54:12,125 [myid:] - INFO [main:QuorumPeerConfig@389] - secureClientPort is not set
2022-04-19 17:54:12,203 [myid:3] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2022-04-19 17:54:12,205 [myid:3] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 1
2022-04-19 17:54:12,205 [myid:3] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2022-04-19 17:54:12,206 [myid:3] - INFO [main:ManagedUtil@46] - Log4j found with jmx enabled.
2022-04-19 17:54:12,213 [myid:3] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2022-04-19 17:54:12,315 [myid:3] - INFO [main:QuorumPeerMain@141] - Starting quorum peer
2022-04-19 17:54:12,320 [myid:3] - INFO [main:ServerCnxnFactory@135] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2022-04-19 17:54:12,321 [myid:3] - INFO [main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 2 worker threads, and 64 kB direct buffers.
2022-04-19 17:54:12,399 [myid:3] - INFO [main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:2181
2022-04-19 17:54:12,419 [myid:3] - INFO [main:Log@193] - Logging initialized @583ms to org.eclipse.jetty.util.log.Slf4jLog
2022-04-19 17:54:12,902 [myid:3] - WARN [main:ContextHandler@1588] - o.e.j.s.ServletContextHandler@1bce4f0a{/,null,UNAVAILABLE} contextPath ends with /*
2022-04-19 17:54:12,902 [myid:3] - WARN [main:ContextHandler@1599] - Empty contextPath
2022-04-19 17:54:12,911 [myid:3] - INFO [main:X509Util@79] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2022-04-19 17:54:12,912 [myid:3] - INFO [main:QuorumPeer@1488] - Local sessions disabled
2022-04-19 17:54:12,912 [myid:3] - INFO [main:QuorumPeer@1499] - Local session upgrading disabled
2022-04-19 17:54:12,912 [myid:3] - INFO [main:QuorumPeer@1466] - tickTime set to 2000
2022-04-19 17:54:12,913 [myid:3] - INFO [main:QuorumPeer@1510] - minSessionTimeout set to 4000
2022-04-19 17:54:12,913 [myid:3] - INFO [main:QuorumPeer@1521] - maxSessionTimeout set to 40000
2022-04-19 17:54:12,913 [myid:3] - INFO [main:QuorumPeer@1536] - initLimit set to 10
2022-04-19 17:54:12,921 [myid:3] - INFO [main:ZKDatabase@117] - zookeeper.snapshotSizeFactor = 0.33
2022-04-19 17:54:12,922 [myid:3] - INFO [main:QuorumPeer@1781] - Using insecure (non-TLS) quorum communication
2022-04-19 17:54:12,997 [myid:3] - INFO [main:QuorumPeer@1787] - Port unification disabled
2022-04-19 17:54:12,997 [myid:3] - INFO [main:QuorumPeer@2154] - QuorumPeer communication is not secured! (SASL auth disabled)
2022-04-19 17:54:12,997 [myid:3] - INFO [main:QuorumPeer@2183] - quorum.cnxn.threads.size set to 20
2022-04-19 17:54:12,998 [myid:3] - INFO [main:FileTxnSnapLog@372] - Snapshotting: 0x0 to /data/snapshot/version-2/snapshot.0
2022-04-19 17:54:13,001 [myid:3] - INFO [main:QuorumPeer@931] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2022-04-19 17:54:13,006 [myid:3] - INFO [main:QuorumPeer@946] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2022-04-19 17:54:13,012 [myid:3] - INFO [main:Server@370] - jetty-9.4.17.v20190418; built: 2019-04-18T19:45:35.259Z; git: aa1c656c315c011c01e7b21aabb04066635b9f67; jvm 1.8.0_232-b09
Daniel Lavoie
04/19/2022, 6:04 PMDaniel Lavoie
04/19/2022, 6:04 PMDaniel Lavoie
04/19/2022, 6:05 PMLuis Fernandez
04/19/2022, 6:07 PMLuis Fernandez
04/19/2022, 6:09 PM[pinot, zookeeper]
Daniel Lavoie
04/19/2022, 6:09 PMls /data/log/
on pod first and second zk pod?Luis Fernandez
04/19/2022, 6:10 PM$ kubectl exec pinot-zookeeper-1 -- ls /data/log/
version-2
Luis Fernandez
04/19/2022, 6:11 PM$ kubectl exec pinot-zookeeper-2 -- ls /data/log/
version-2
Daniel Lavoie
04/19/2022, 6:11 PM/data/log/version-2
Luis Fernandez
04/19/2022, 6:11 PM$ kubectl exec pinot-zookeeper-2 -- ls /data/log/version-2
log.b000011cd
Luis Fernandez
04/19/2022, 6:11 PM$ kubectl exec pinot-zookeeper-1 -- ls /data/log/version-2
log.c00000001
Daniel Lavoie
04/19/2022, 6:12 PMLuis Fernandez
04/19/2022, 6:13 PMLuis Fernandez
04/19/2022, 6:15 PM[pinot, zookeeper]
Luis Fernandez
04/19/2022, 6:15 PM$ kubectl exec pinot-zookeeper-0 -- ls /data/log/version-2
log.c000002a3
Luis Fernandez
04/19/2022, 6:15 PMDaniel Lavoie
04/19/2022, 6:16 PMLuis Fernandez
04/19/2022, 6:17 PMLuis Fernandez
04/19/2022, 6:17 PMDaniel Lavoie
04/19/2022, 6:17 PMLuis Fernandez
04/19/2022, 6:17 PMDaniel Lavoie
04/19/2022, 6:18 PMDaniel Lavoie
04/19/2022, 6:18 PMLuis Fernandez
04/19/2022, 6:20 PMDaniel Lavoie
04/19/2022, 6:20 PMDaniel Lavoie
04/19/2022, 6:20 PMhelm upgrade
. That variable will be missingLuis Fernandez
04/19/2022, 6:23 PMenv:
- name: ZK_REPLICAS
value: "3"
- name: JMXAUTH
value: "false"
- name: JMXDISABLE
value: "false"
Luis Fernandez
04/19/2022, 6:24 PMDaniel Lavoie
04/19/2022, 6:24 PMDaniel Lavoie
04/19/2022, 6:25 PMzookeeper.env.ZK_DATA_LOG_DIR=/data/log
zookeeper.env.ZK_DATA_DIR=/data/snapshots
Luis Fernandez
04/19/2022, 6:26 PMhelm upgrade
Daniel Lavoie
04/19/2022, 6:27 PMLuis Fernandez
04/19/2022, 6:27 PMHarish Bohara
04/20/2022, 10:50 AMLuis Fernandez
04/20/2022, 5:21 PMLuis Fernandez
04/20/2022, 5:21 PMLuis Fernandez
04/20/2022, 5:21 PMDaniel Lavoie
04/20/2022, 5:22 PMDaniel Lavoie
04/20/2022, 5:22 PM