Luis Fernandez
08/24/2022, 8:50 PMpinot-server-1
at that point pinot-server-1
was getting scaled up and pinot-server-0
was working without issue and serving stuff after a bit when pinot-server-1
was coming back up we started getting the following error:
[
{
"message": "java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)",
"errorCode": 425
},
{
"message": "2 servers [pinot-server-0_R, pinot-server-1_R] not responded",
"errorCode": 427
}
]
and also
2022-08-24 16:47:32
java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local
2022-08-24 16:47:32
Caught exception while sending request 183945048 to server: pinot-server-1_R, marking query failed
what i’m trying to understand is how queries were getting routed to pinot-server-1 if it was down, after a bit this problem resolves itself without us doing anything but we did get some downtime.Mayank
Luis Fernandez
08/24/2022, 9:21 PMMayank
Luis Fernandez
08/24/2022, 9:22 PMMayank
Luis Fernandez
08/24/2022, 9:25 PMNAME READY STATUS RESTARTS AGE
pinot-server-0 1/1 Running 0 14m
pinot-server-1 0/1 Running 0 2m24s
and the broker was logging request against pinot-server-1
2022-08-24 16:48:45
java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local
2022-08-24 16:48:45
Caught exception while sending request 183950834 to server: pinot-server-1_R, marking query failed
Luis Fernandez
08/24/2022, 9:26 PMpinot-server-0
tooMayank
Luis Fernandez
08/24/2022, 9:36 PMLuis Fernandez
08/24/2022, 9:36 PMkubectl delete pod
😄Luis Fernandez
08/24/2022, 9:37 PMMayank
Xiang Fu
Xiang Fu
livenessProbe:
failureThreshold: 168
httpGet:
path: /health
port: 8097
scheme: HTTPS
initialDelaySeconds: 60
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
Luis Fernandez
08/24/2022, 10:52 PMLuis Fernandez
08/24/2022, 10:55 PMLuis Fernandez
08/24/2022, 10:55 PMXiang Fu
Luis Fernandez
08/24/2022, 10:56 PMprobes+: {
livenessEnabled: true,
readinessEnabled: true,
initialDelaySeconds: 120,
periodSeconds: 10,
},
Xiang Fu
Xiang Fu
periodSeconds: 10,
Xiang Fu
Luis Fernandez
08/24/2022, 10:57 PMLuis Fernandez
08/24/2022, 10:57 PMLiveness: http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Luis Fernandez
08/24/2022, 10:58 PMXiang Fu
Luis Fernandez
08/24/2022, 10:59 PMLuis Fernandez
08/24/2022, 11:02 PMXiang Fu
Luis Fernandez
08/24/2022, 11:02 PMLuis Fernandez
08/24/2022, 11:02 PMLuis Fernandez
08/24/2022, 11:02 PMXiang Fu
Xiang Fu
Xiang Fu
Luis Fernandez
08/24/2022, 11:03 PMkubectl delete pod pinot-server-0
Luis Fernandez
08/24/2022, 11:03 PMkubectl delete pod pinot-server-1
Luis Fernandez
08/24/2022, 11:04 PMLuis Fernandez
08/24/2022, 11:04 PMLuis Fernandez
08/24/2022, 11:04 PMpinot-server-0
was down brokers were trying to send traffic to itLuis Fernandez
08/24/2022, 11:04 PMpinot-server-1
was trying to come backXiang Fu
Xiang Fu
pod.Spec.TerminationGracePeriodSeconds
to be some number like 20 secs for enough timeLuis Fernandez
08/24/2022, 11:06 PMLuis Fernandez
08/24/2022, 11:06 PMXiang Fu
Luis Fernandez
08/24/2022, 11:07 PMXiang Fu
Xiang Fu
Luis Fernandez
08/24/2022, 11:08 PMXiang Fu
Xiang Fu
Xiang Fu
Xiang Fu
Luis Fernandez
08/24/2022, 11:14 PMLuis Fernandez
08/24/2022, 11:19 PMMetrics scheduler closed
Closing reporter org.apache.kafka.common.metrics.JmxReporter
Metrics reporters closed
App info kafka.consumer for consumer-null-1926 unregistered
Metrics scheduler closed
Closing reporter org.apache.kafka.common.metrics.JmxReporter
Metrics reporters closed
App info kafka.consumer for consumer-null-1927 unregistered
Shut down table data manager for table: query_metrics_REALTIME
Shut down segment build time lease extender executor
Helix instance data manager shut down
Shutting down metrics registry
Finish shutting down server instance
Stopped tracking server queries disabled.
Deregistering service status handler
Finish shutting down Pinot server for Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098
Pinot [SERVER] Instance [Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098] is Stopped...
Shutting down Pinot Service Manager admin application...
Deregistering service status handler
Xiang Fu
Luis Fernandez
08/25/2022, 1:53 PMMayank
Luis Fernandez
08/25/2022, 2:24 PMLuis Fernandez
08/25/2022, 2:24 PMLuis Fernandez
08/25/2022, 2:29 PMLuis Fernandez
08/25/2022, 2:29 PM/debug/routingTable/{tableWithType}
Luis Fernandez
08/25/2022, 2:30 PMpinot-server-0
it’s on the process of trying to come backMayank
Mayank
Luis Fernandez
08/25/2022, 2:31 PMLuis Fernandez
08/25/2022, 2:32 PMLuis Fernandez
08/25/2022, 2:33 PM"mapFields": {
"etsyads_metrics__0__220__20220817T1813Z": {
"Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
},
"etsyads_metrics__0__221__20220818T1448Z": {
"Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
},
"etsyads_metrics__0__222__20220819T1031Z": {
"Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
},
Luis Fernandez
08/25/2022, 2:33 PMLuis Fernandez
08/25/2022, 2:36 PM/debug/routingTable/etsyads_metrics_REALTIME
Luis Fernandez
08/25/2022, 2:36 PMLuis Fernandez
08/25/2022, 2:37 PM"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098":
Luis Fernandez
08/25/2022, 2:37 PMLuis Fernandez
08/25/2022, 2:37 PMLuis Fernandez
08/25/2022, 2:37 PM"Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098":
"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098":
Luis Fernandez
08/25/2022, 2:37 PMLuis Fernandez
08/25/2022, 2:37 PM[
{
"message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)",
"errorCode": 425
},
{
"message": "4 servers [pinot-server-0_O, pinot-server-1_O, pinot-server-0_R, pinot-server-1_R] not responded",
"errorCode": 427
}
]
Luis Fernandez
08/25/2022, 2:37 PMpinot-server-0 0/1 Running 0 3m44s
pinot-server-1 1/1 Running 0 15h
Luis Fernandez
08/25/2022, 2:38 PMMayank
Luis Fernandez
08/25/2022, 6:18 PMLuis Fernandez
08/25/2022, 6:18 PMLuis Fernandez
08/25/2022, 6:26 PM"mapFields": {
"etsyads_metrics__0__221__20220818T1448Z": {
"Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "OFFLINE",
"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
},
Luis Fernandez
08/25/2022, 6:26 PMpinot-server-0
is trying to come backLuis Fernandez
08/25/2022, 6:26 PMLuis Fernandez
08/25/2022, 6:26 PMpinot-server-0
Luis Fernandez
08/25/2022, 6:27 PMpinot-server-0 0/1 Running 0 2m44s
Luis Fernandez
08/25/2022, 6:27 PM[
{
"message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
"errorCode": 425
},
{
"message": "2 servers [pinot-server-1_R, pinot-server-0_R] not responded",
"errorCode": 427
}
]
Luis Fernandez
08/25/2022, 6:28 PMLuis Fernandez
08/25/2022, 6:28 PMLuis Fernandez
08/25/2022, 6:28 PMLuis Fernandez
08/25/2022, 6:33 PMLuis Fernandez
08/31/2022, 2:16 PMXiang Fu
Xiang Fu
Xiang Fu
Luis Fernandez
08/31/2022, 4:59 PMLuis Fernandez
08/31/2022, 4:59 PMXiang Fu
Luis Fernandez
08/31/2022, 5:02 PMXiang Fu
readinessProbe
for thatLuis Fernandez
08/31/2022, 6:49 PMLuis Fernandez
08/31/2022, 6:49 PMLiveness: http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Luis Fernandez
08/31/2022, 6:49 PMLuis Fernandez
08/31/2022, 6:49 PMLuis Fernandez
08/31/2022, 6:50 PMLuis Fernandez
08/31/2022, 6:50 PMXiang Fu
Xiang Fu
Luis Fernandez
08/31/2022, 7:00 PMLuis Fernandez
08/31/2022, 7:00 PMSometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.
Luis Fernandez
08/31/2022, 7:00 PMXiang Fu
Luis Fernandez
08/31/2022, 7:02 PMXiang Fu
Luis Fernandez
08/31/2022, 7:02 PMfailureThreshold
Luis Fernandez
08/31/2022, 7:03 PMLuis Fernandez
08/31/2022, 7:03 PMXiang Fu
Xiang Fu
Xiang Fu
Xiang Fu
Luis Fernandez
08/31/2022, 7:08 PMXiang Fu
Luis Fernandez
08/31/2022, 7:08 PMXiang Fu
Luis Fernandez
08/31/2022, 7:13 PMLuis Fernandez
08/31/2022, 7:17 PMLuis Fernandez
08/31/2022, 7:17 PMlivenessProbe+: {
initialDelaySeconds: 30,
periodSeconds: 10,
failureThreshold, 50
},
readinessProbe+: {
initialDelaySeconds: 60,
periodSeconds: 10,
failureThreshold: 50,
},
Xiang Fu
Xiang Fu
Xiang Fu
Luis Fernandez
08/31/2022, 7:24 PMlivenessProbe+: {
initialDelaySeconds: 30,
periodSeconds: 5,
failureThreshold: 120,
},
readinessProbe+: {
initialDelaySeconds: 60,
periodSeconds: 10,
failureThreshold: 60,
},
}
Luis Fernandez
08/31/2022, 7:24 PMXiang Fu
Luis Fernandez
08/31/2022, 7:35 PMLuis Fernandez
08/31/2022, 7:35 PMLuis Fernandez
08/31/2022, 7:35 PM[
{
"message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
"errorCode": 425
},
{
"message": "2 servers [pinot-server-1_R, pinot-server-0_R] not responded",
"errorCode": 427
}
]
Luis Fernandez
08/31/2022, 7:35 PMLuis Fernandez
08/31/2022, 7:56 PMLuis Fernandez
08/31/2022, 7:56 PMLuis Fernandez
08/31/2022, 7:56 PMLuis Fernandez
08/31/2022, 7:56 PMLuis Fernandez
08/31/2022, 7:56 PMXiang Fu
Xiang Fu
Xiang Fu
Luis Fernandez
08/31/2022, 8:00 PMLuis Fernandez
08/31/2022, 8:00 PMlivenessProbe+: {
initialDelaySeconds: 10,
periodSeconds: 1,
failureThreshold: 600,
},
readinessProbe+: {
initialDelaySeconds: 15,
periodSeconds: 1,
failureThreshold: 600,
},
Xiang Fu
Luis Fernandez
08/31/2022, 8:01 PMXiang Fu
Luis Fernandez
08/31/2022, 8:01 PMXiang Fu
Xiang Fu
Xiang Fu
Luis Fernandez
08/31/2022, 8:03 PMLuis Fernandez
08/31/2022, 8:03 PMLuis Fernandez
08/31/2022, 8:04 PM0.11.0
?or for alterLuis Fernandez
08/31/2022, 8:04 PMXiang Fu
Xiang Fu
Xiang Fu