question: I was doing an scaling exercise today an...
# troubleshooting
l
question: I was doing an scaling exercise today and a couple of things happened: our setup is 2 servers we were trying to up the number of cores. the way we did this was by adding the configs in kubernetes and applying them when deleting the pod, say
pinot-server-1
at that point
pinot-server-1
was getting scaled up and
pinot-server-0
was working without issue and serving stuff after a bit when
pinot-server-1
was coming back up we started getting the following error:
Copy code
[
  {
    "message": "java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)",
    "errorCode": 425
  },
  {
    "message": "2 servers [pinot-server-0_R, pinot-server-1_R] not responded",
    "errorCode": 427
  }
]
and also
Copy code
2022-08-24 16:47:32	
java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local
2022-08-24 16:47:32	
Caught exception while sending request 183945048 to server: pinot-server-1_R, marking query failed
what i’m trying to understand is how queries were getting routed to pinot-server-1 if it was down, after a bit this problem resolves itself without us doing anything but we did get some downtime.
m
If you scale one replica at a time (and wait until it is up), then you shouldn’t get downtime.
l
that's what we did but it seems like traffic was being served to one of the servers before it was ready to get traffic (?)
m
A server will only receive traffic for segments that it claims it is ready for
l
And in kubernetes it wasn't even ready yet
m
If it is not ready in k8s, then segments it hosts would be OFFLINE in external-view, and broker won’t send queries to it.
l
in k8s we were seeing this:
Copy code
NAME                 READY   STATUS    RESTARTS      AGE
pinot-server-0       1/1     Running   0             14m
pinot-server-1       0/1     Running   0             2m24s
and the broker was logging request against
pinot-server-1
Copy code
2022-08-24 16:48:45	
java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local
2022-08-24 16:48:45	
Caught exception while sending request 183950834 to server: pinot-server-1_R, marking query failed
similar thing happened when we scaled
pinot-server-0
too
m
Hmm, it might be that it wasn’t shutdown gracefully for EV to go offline. ANd the broker might be looking at the obsolete EV
l
that may be it, how do we shutdown gracefully (?)
i def just did
kubectl delete pod
😄
also, for sometime this works like with one server and then eventually it does start failing till it comes back to life
m
@Xiang Fu on how to call shutdown when deleting a pod.
x
add something like based on your own setup:
Copy code
livenessProbe:
          failureThreshold: 168
          httpGet:
            path: /health
            port: 8097
            scheme: HTTPS
          initialDelaySeconds: 60
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
l
do they get a value by default?
x
check values.yaml when you apply the helm
l
i just see us configuring the delays and making setting them as true
Copy code
probes+: {
            livenessEnabled: true,
            readinessEnabled: true,
            initialDelaySeconds: 120,
            periodSeconds: 10,
          },
x
also you can find it from the statefulset
Copy code
periodSeconds: 10,
so I guess that’s the timeout
l
this is what i see
server statefulset:
Copy code
Liveness:   http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Readiness:  http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
it waits quite a bit to check
x
right
l
but brokers still send traffic to it even tho it may not be ready
so that’s configured hmm
x
who did you try the deletion?
l
oh to apply the change
so in this case it was from 4 cores to 8
should i shut it down differentely?
x
typically when the kill signal sent, the server should deregister itself from helix, so broker will built the routing table
ic
so kubectl scale?
l
i did
kubectl delete pod pinot-server-0
and
kubectl delete pod pinot-server-1
so did the first command waited for it to come back and then did the second command
but in between we got downtime
eventho
pinot-server-0
was down brokers were trying to send traffic to it
and same thing happen when
pinot-server-1
was trying to come back
you can set this
pod.Spec.TerminationGracePeriodSeconds
to be some number like 20 secs for enough time
l
ooo
😮
x
we typically put 30 seconds in prod
l
how do i know if it has that?
x
describe the statefulset of pinot server
and check if that key word is there
l
oh so i didn’t find the keyword so by default it would be 0
x
need to check that
you can tail the pod log and then kill the pod
then see how long it runs and if the pinot server is clean shutdown
there are multiple k8s configs you can play around with the lifecycle
l
what log do i look at to see if it’s clean shutdown
Copy code
Metrics scheduler closed
Closing reporter org.apache.kafka.common.metrics.JmxReporter
Metrics reporters closed
App info kafka.consumer for consumer-null-1926 unregistered
Metrics scheduler closed
Closing reporter org.apache.kafka.common.metrics.JmxReporter
Metrics reporters closed
App info kafka.consumer for consumer-null-1927 unregistered
Shut down table data manager for table: query_metrics_REALTIME
Shut down segment build time lease extender executor
Helix instance data manager shut down
Shutting down metrics registry
Finish shutting down server instance
Stopped tracking server queries disabled.
Deregistering service status handler
Finish shutting down Pinot server for Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098
Pinot [SERVER] Instance [Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098] is Stopped...
Shutting down Pinot Service Manager admin application...
Deregistering service status handler
x
yes, this means the server is closed cleanly
l
then if this happens properly why would we be sending traffic to it when it’s not ready to serve queries
m
When server-1 is down check external view and routing table (on broker debug end point)
l
what do i check in the external view
the brokerResource?
and the debug endpoint would be
/debug/routingTable/{tableWithType}
so right now i have taken down
pinot-server-0
it’s on the process of trying to come back
m
No external view for table
Debug endpoint on broker host
this is what i see for one of the tables
Copy code
"mapFields": {
    "etsyads_metrics__0__220__20220817T1813Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },
    "etsyads_metrics__0__221__20220818T1448Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },
    "etsyads_metrics__0__222__20220819T1031Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },
(ExternalView)
using:
/debug/routingTable/etsyads_metrics_REALTIME
for sometime i only got
"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098":
and things were okay
but know i’m seeing both of them
"Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098":
"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098":
and queries are failing
Copy code
[
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)",
    "errorCode": 425
  },
  {
    "message": "4 servers [pinot-server-0_O, pinot-server-1_O, pinot-server-0_R, pinot-server-1_R] not responded",
    "errorCode": 427
  }
]
Copy code
pinot-server-0       0/1     Running   0             3m44s
pinot-server-1       1/1     Running   0             15h
not yet ready in kube
m
Hmm, so it is ready in IS and EV after restart, but not ready in k8s? That doesn’t seem to make sense.
l
anything else i could show
i can recreate on demand
Copy code
"mapFields": {
    "etsyads_metrics__0__221__20220818T1448Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "OFFLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },
pinot-server-0
is trying to come back
when this is showing this things are not failing
and no traffic is going to
pinot-server-0
Copy code
pinot-server-0       0/1     Running   0             2m44s
Copy code
[
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
    "errorCode": 425
  },
  {
    "message": "2 servers [pinot-server-1_R, pinot-server-0_R] not responded",
    "errorCode": 427
  }
]
ok…. so at that point the none of the probes have check for the health status
could it be that the pinot-server marks itself ready to receive traffic or it subscribe to the entire cluster, but to the kubernetes eyes is not yet ready and we get the exception?
after all we have a 600s delay to do the first probe check
and once that happens everything starts working
trying to revive this in case someone has an idea, given that today we may have to restart the servers and may have some downtime in prod
x
this is a good point
you should make k8s health before pinot
otherwise, pinot thought this pinot-server-0 is up, but k8s says no, so you got this unknownHost exception
l
so basically do the first check before right?
how much delay do you have?
x
we just do 10 seconds.
l
and how many times do you try?
x
but I think if you have a lot of data, and you don’t want to have the next restart before this node is up, then you should define
readinessProbe
for that
l
so we do have both of them
Copy code
Liveness:   http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Readiness:  http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
but both of them have delay of 600s
I think we set that up because whenever we were adding new nodes it took s a minute for new servers to get the data on disk, does that make sense?
so maybe liveness probe can be delayed
but readiness probe should probably have way lesser delay
x
maybe make liveness short?
or if you give more tolerance on the number of failures.
l
oh i thought that we had to make readiness probe shortly for the first check given that it says in the documentation
Copy code
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.
well i guess both of them have to be shorter
x
it means the server is live but not able to serve traffic
l
so yea so i guess i can basically say both of them do the first check in 30s
x
live means dns registered
l
and then give it a greater
failureThreshold
like, 50 or something every 15s
so that gives it 750s
x
right
depends on how long your normal server restart time
make sure your server is live first then ready
so dns won’t have problem
l
when we are not adding a new node is pretty fast
x
yes
l
when we add a new node that it has to pull data from gcs and stuff it takes a minute
x
that’s ok
l
so i’m gonna switch it up a bit then maybe do liveness in 30s and readiness in 60s
so like
Copy code
livenessProbe+: {
                      initialDelaySeconds: 30,
                      periodSeconds: 10,
                      failureThreshold, 50
                    },
                    readinessProbe+: {
                      initialDelaySeconds: 60,
                      periodSeconds: 10,
                      failureThreshold: 50,
                    },
x
Yeah
You can make liveness check more frequent
so the server goes live before ready check
l
Copy code
livenessProbe+: {
                      initialDelaySeconds: 30,
                      periodSeconds: 5,
                      failureThreshold: 120,
                    },
                    readinessProbe+: {
                      initialDelaySeconds: 60,
                      periodSeconds: 10,
                      failureThreshold: 60,
                    },
                  }
going with this
x
Right
l
i still got the exception for sometime
way faster to recover now
Copy code
[
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
    "errorCode": 425
  },
  {
    "message": "2 servers [pinot-server-1_R, pinot-server-0_R] not responded",
    "errorCode": 427
  }
]
but still getting this
what if i hid it every second 😄
the health end point
if i hit it*
the times are better now tho for sure to recover
but ideally i would like not to get the exception at all
x
I think so
do it in second
I think you hit the time when it’s live but also ready
l
going with this
Copy code
livenessProbe+: {
                      initialDelaySeconds: 10,
                      periodSeconds: 1,
                      failureThreshold: 600,
                    },
                    readinessProbe+: {
                      initialDelaySeconds: 15,
                      periodSeconds: 1,
                      failureThreshold: 600,
                    },
x
Right now you are on same health check
l
maybe i should have them both start at the same time (?)
x
We plan to add new entry point to reflect the live and ready api
🍷 1
l
so 10s (?)
x
Theoretically it doesn’t matter
If you are unlucky enough, then your liveness and ready will pass at same time
then you may still see a glitch
l
it has to be live first yes that’s why we perform that one first (?)
and then ready
is that coming for
0.11.0
?or for alter
later*
x
I believe so
cc: @Rong R
In 0.11