question I was doing an scaling exercise today and a couple Apache Pinot #troubleshooting

question: I was doing an scaling exercise today an...

Luis Fernandez

08/24/2022, 8:50 PM

question: I was doing an scaling exercise today and a couple of things happened: our setup is 2 servers we were trying to up the number of cores. the way we did this was by adding the configs in kubernetes and applying them when deleting the pod, say

pinot-server-1

at that point

pinot-server-1

was getting scaled up and

pinot-server-0

was working without issue and serving stuff after a bit when

pinot-server-1

was coming back up we started getting the following error:

Copy code

[
  {
    "message": "java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)",
    "errorCode": 425
  },
  {
    "message": "2 servers [pinot-server-0_R, pinot-server-1_R] not responded",
    "errorCode": 427
  }
]

and also

Copy code

2022-08-24 16:47:32	
java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local
2022-08-24 16:47:32	
Caught exception while sending request 183945048 to server: pinot-server-1_R, marking query failed

what i’m trying to understand is how queries were getting routed to pinot-server-1 if it was down, after a bit this problem resolves itself without us doing anything but we did get some downtime.

Mayank

08/24/2022, 9:19 PM

If you scale one replica at a time (and wait until it is up), then you shouldn’t get downtime.

Luis Fernandez

08/24/2022, 9:21 PM

that's what we did but it seems like traffic was being served to one of the servers before it was ready to get traffic (?)

Mayank

08/24/2022, 9:21 PM

A server will only receive traffic for segments that it claims it is ready for

Luis Fernandez

08/24/2022, 9:22 PM

And in kubernetes it wasn't even ready yet

Mayank

08/24/2022, 9:23 PM

If it is not ready in k8s, then segments it hosts would be OFFLINE in external-view, and broker won’t send queries to it.

Luis Fernandez

08/24/2022, 9:25 PM

in k8s we were seeing this:

Copy code

NAME                 READY   STATUS    RESTARTS      AGE
pinot-server-0       1/1     Running   0             14m
pinot-server-1       0/1     Running   0             2m24s

and the broker was logging request against

pinot-server-1

Copy code

2022-08-24 16:48:45	
java.net.UnknownHostException: pinot-server-1.pinot-server-headless.pinot.svc.cluster.local
2022-08-24 16:48:45	
Caught exception while sending request 183950834 to server: pinot-server-1_R, marking query failed

Luis Fernandez

08/24/2022, 9:26 PM

similar thing happened when we scaled

pinot-server-0

too

Mayank

08/24/2022, 9:35 PM

Hmm, it might be that it wasn’t shutdown gracefully for EV to go offline. ANd the broker might be looking at the obsolete EV

Luis Fernandez

08/24/2022, 9:36 PM

that may be it, how do we shutdown gracefully (?)

Luis Fernandez

08/24/2022, 9:36 PM

i def just did

kubectl delete pod

😄

Luis Fernandez

08/24/2022, 9:37 PM

also, for sometime this works like with one server and then eventually it does start failing till it comes back to life

Mayank

08/24/2022, 9:42 PM

@Xiang Fu on how to call shutdown when deleting a pod.

Xiang Fu

08/24/2022, 10:44 PM

did you set liveness/readiness/startup probe? https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Xiang Fu

08/24/2022, 10:47 PM

add something like based on your own setup:

Copy code

livenessProbe:
          failureThreshold: 168
          httpGet:
            path: /health
            port: 8097
            scheme: HTTPS
          initialDelaySeconds: 60
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1

Luis Fernandez

08/24/2022, 10:52 PM

oh so i see it here https://github.com/apache/pinot/blob/master/kubernetes/helm/pinot/templates/server/statefulset.yml#L84 in our setup we do have it

Luis Fernandez

08/24/2022, 10:55 PM

https://github.com/apache/pinot/blob/master/kubernetes/helm/pinot/templates/server/statefulset.yml#L84-L97

Luis Fernandez

08/24/2022, 10:55 PM

do they get a value by default?

Xiang Fu

08/24/2022, 10:56 PM

check values.yaml when you apply the helm

Luis Fernandez

08/24/2022, 10:56 PM

i just see us configuring the delays and making setting them as true

Copy code

probes+: {
            livenessEnabled: true,
            readinessEnabled: true,
            initialDelaySeconds: 120,
            periodSeconds: 10,
          },

Xiang Fu

08/24/2022, 10:56 PM

also you can find it from the statefulset

Xiang Fu

08/24/2022, 10:56 PM

Copy code

periodSeconds: 10,

Xiang Fu

08/24/2022, 10:57 PM

so I guess that’s the timeout

Luis Fernandez

08/24/2022, 10:57 PM

this is what i see

Luis Fernandez

08/24/2022, 10:57 PM

server statefulset:

Copy code

Liveness:   http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Readiness:  http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3

Luis Fernandez

08/24/2022, 10:58 PM

it waits quite a bit to check

Xiang Fu

08/24/2022, 10:59 PM

right

Luis Fernandez

08/24/2022, 10:59 PM

but brokers still send traffic to it even tho it may not be ready

Luis Fernandez

08/24/2022, 11:02 PM

so that’s configured hmm

Xiang Fu

08/24/2022, 11:02 PM

who did you try the deletion?

Luis Fernandez

08/24/2022, 11:02 PM

oh to apply the change

Luis Fernandez

08/24/2022, 11:02 PM

so in this case it was from 4 cores to 8

Luis Fernandez

08/24/2022, 11:02 PM

should i shut it down differentely?

Xiang Fu

08/24/2022, 11:03 PM

typically when the kill signal sent, the server should deregister itself from helix, so broker will built the routing table

Xiang Fu

08/24/2022, 11:03 PM

Xiang Fu

08/24/2022, 11:03 PM

so kubectl scale?

Luis Fernandez

08/24/2022, 11:03 PM

i did

kubectl delete pod pinot-server-0

Luis Fernandez

08/24/2022, 11:03 PM

and

kubectl delete pod pinot-server-1

Luis Fernandez

08/24/2022, 11:04 PM

so did the first command waited for it to come back and then did the second command

Luis Fernandez

08/24/2022, 11:04 PM

but in between we got downtime

Luis Fernandez

08/24/2022, 11:04 PM

eventho

pinot-server-0

was down brokers were trying to send traffic to it

Luis Fernandez

08/24/2022, 11:04 PM

and same thing happen when

pinot-server-1

was trying to come back

Xiang Fu

08/24/2022, 11:05 PM

https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#delete-pods

Xiang Fu

08/24/2022, 11:06 PM

you can set this

pod.Spec.TerminationGracePeriodSeconds

to be some number like 20 secs for enough time

Luis Fernandez

08/24/2022, 11:06 PM

ooo

Luis Fernandez

08/24/2022, 11:06 PM

😮

Xiang Fu

08/24/2022, 11:07 PM

we typically put 30 seconds in prod

Luis Fernandez

08/24/2022, 11:07 PM

how do i know if it has that?

Xiang Fu

08/24/2022, 11:07 PM

describe the statefulset of pinot server

Xiang Fu

08/24/2022, 11:07 PM

and check if that key word is there

Luis Fernandez

08/24/2022, 11:08 PM

oh so i didn’t find the keyword so by default it would be 0

Xiang Fu

08/24/2022, 11:10 PM

need to check that

Xiang Fu

08/24/2022, 11:10 PM

you can tail the pod log and then kill the pod

Xiang Fu

08/24/2022, 11:11 PM

then see how long it runs and if the pinot server is clean shutdown

Xiang Fu

08/24/2022, 11:11 PM

there are multiple k8s configs you can play around with the lifecycle

Luis Fernandez

08/24/2022, 11:14 PM

what log do i look at to see if it’s clean shutdown

Luis Fernandez

08/24/2022, 11:19 PM

Copy code

Metrics scheduler closed
Closing reporter org.apache.kafka.common.metrics.JmxReporter
Metrics reporters closed
App info kafka.consumer for consumer-null-1926 unregistered
Metrics scheduler closed
Closing reporter org.apache.kafka.common.metrics.JmxReporter
Metrics reporters closed
App info kafka.consumer for consumer-null-1927 unregistered
Shut down table data manager for table: query_metrics_REALTIME
Shut down segment build time lease extender executor
Helix instance data manager shut down
Shutting down metrics registry
Finish shutting down server instance
Stopped tracking server queries disabled.
Deregistering service status handler
Finish shutting down Pinot server for Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098
Pinot [SERVER] Instance [Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098] is Stopped...
Shutting down Pinot Service Manager admin application...
Deregistering service status handler

Xiang Fu

08/24/2022, 11:24 PM

yes, this means the server is closed cleanly

Luis Fernandez

08/25/2022, 1:53 PM

then if this happens properly why would we be sending traffic to it when it’s not ready to serve queries

Mayank

08/25/2022, 2:07 PM

When server-1 is down check external view and routing table (on broker debug end point)

Luis Fernandez

08/25/2022, 2:24 PM

what do i check in the external view

Luis Fernandez

08/25/2022, 2:24 PM

the brokerResource?

Luis Fernandez

08/25/2022, 2:29 PM

and the debug endpoint would be

Luis Fernandez

08/25/2022, 2:29 PM

/debug/routingTable/{tableWithType}

Luis Fernandez

08/25/2022, 2:30 PM

so right now i have taken down

pinot-server-0

it’s on the process of trying to come back

Mayank

08/25/2022, 2:30 PM

No external view for table

Mayank

08/25/2022, 2:30 PM

Debug endpoint on broker host

Luis Fernandez

08/25/2022, 2:31 PM

i’m looking at this file for the debug endpoint: https://github.com/lfernandez93/pinot/blob/7e9ca6a5a4afe0d4e283ac1307c45430e474cbf[…]ava/org/apache/pinot/broker/api/resources/PinotBrokerDebug.java

Luis Fernandez

08/25/2022, 2:32 PM

this is what i see for one of the tables

Luis Fernandez

08/25/2022, 2:33 PM

Copy code

"mapFields": {
    "etsyads_metrics__0__220__20220817T1813Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },
    "etsyads_metrics__0__221__20220818T1448Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },
    "etsyads_metrics__0__222__20220819T1031Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },

Luis Fernandez

08/25/2022, 2:33 PM

(ExternalView)

Luis Fernandez

08/25/2022, 2:36 PM

using:

/debug/routingTable/etsyads_metrics_REALTIME

Luis Fernandez

08/25/2022, 2:36 PM

for sometime i only got

Luis Fernandez

08/25/2022, 2:37 PM

"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098":

Luis Fernandez

08/25/2022, 2:37 PM

and things were okay

Luis Fernandez

08/25/2022, 2:37 PM

but know i’m seeing both of them

Luis Fernandez

08/25/2022, 2:37 PM

"Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098":

"Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098":

Luis Fernandez

08/25/2022, 2:37 PM

and queries are failing

Luis Fernandez

08/25/2022, 2:37 PM

Copy code

[
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)",
    "errorCode": 425
  },
  {
    "message": "4 servers [pinot-server-0_O, pinot-server-1_O, pinot-server-0_R, pinot-server-1_R] not responded",
    "errorCode": 427
  }
]

Luis Fernandez

08/25/2022, 2:37 PM

Copy code

pinot-server-0       0/1     Running   0             3m44s
pinot-server-1       1/1     Running   0             15h

Luis Fernandez

08/25/2022, 2:38 PM

not yet ready in kube

Mayank

08/25/2022, 5:31 PM

Hmm, so it is ready in IS and EV after restart, but not ready in k8s? That doesn’t seem to make sense.

Luis Fernandez

08/25/2022, 6:18 PM

anything else i could show

Luis Fernandez

08/25/2022, 6:18 PM

i can recreate on demand

Luis Fernandez

08/25/2022, 6:26 PM

Copy code

"mapFields": {
    "etsyads_metrics__0__221__20220818T1448Z": {
      "Server_pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "OFFLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot-dev.svc.cluster.local_8098": "ONLINE"
    },

Luis Fernandez

08/25/2022, 6:26 PM

pinot-server-0

is trying to come back

Luis Fernandez

08/25/2022, 6:26 PM

when this is showing this things are not failing

Luis Fernandez

08/25/2022, 6:26 PM

and no traffic is going to

pinot-server-0

Luis Fernandez

08/25/2022, 6:27 PM

Copy code

pinot-server-0       0/1     Running   0             2m44s

Luis Fernandez

08/25/2022, 6:27 PM

Copy code

[
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot-dev.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
    "errorCode": 425
  },
  {
    "message": "2 servers [pinot-server-1_R, pinot-server-0_R] not responded",
    "errorCode": 427
  }
]

Luis Fernandez

08/25/2022, 6:28 PM

ok…. so at that point the none of the probes have check for the health status

Luis Fernandez

08/25/2022, 6:28 PM

could it be that the pinot-server marks itself ready to receive traffic or it subscribe to the entire cluster, but to the kubernetes eyes is not yet ready and we get the exception?

Luis Fernandez

08/25/2022, 6:28 PM

after all we have a 600s delay to do the first probe check

Luis Fernandez

08/25/2022, 6:33 PM

and once that happens everything starts working

Luis Fernandez

08/31/2022, 2:16 PM

trying to revive this in case someone has an idea, given that today we may have to restart the servers and may have some downtime in prod

Xiang Fu

08/31/2022, 4:57 PM

this is a good point

Xiang Fu

08/31/2022, 4:57 PM

you should make k8s health before pinot

Xiang Fu

08/31/2022, 4:58 PM

otherwise, pinot thought this pinot-server-0 is up, but k8s says no, so you got this unknownHost exception

Luis Fernandez

08/31/2022, 4:59 PM

so basically do the first check before right?

Luis Fernandez

08/31/2022, 4:59 PM

how much delay do you have?

Xiang Fu

08/31/2022, 5:01 PM

we just do 10 seconds.

Luis Fernandez

08/31/2022, 5:02 PM

and how many times do you try?

Xiang Fu

08/31/2022, 5:03 PM

but I think if you have a lot of data, and you don’t want to have the next restart before this node is up, then you should define readinessProbe
for that

Xiang Fu

08/31/2022, 5:04 PM

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes

Luis Fernandez

08/31/2022, 6:49 PM

so we do have both of them

Luis Fernandez

08/31/2022, 6:49 PM

Copy code

Liveness:   http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3
Readiness:  http-get http://:8097/health delay=600s timeout=1s period=10s #success=1 #failure=3

Luis Fernandez

08/31/2022, 6:49 PM

but both of them have delay of 600s

Luis Fernandez

08/31/2022, 6:49 PM

I think we set that up because whenever we were adding new nodes it took s a minute for new servers to get the data on disk, does that make sense?

Luis Fernandez

08/31/2022, 6:50 PM

so maybe liveness probe can be delayed

Luis Fernandez

08/31/2022, 6:50 PM

but readiness probe should probably have way lesser delay

Xiang Fu

08/31/2022, 6:53 PM

maybe make liveness short?

Xiang Fu

08/31/2022, 6:54 PM

or if you give more tolerance on the number of failures.

Luis Fernandez

08/31/2022, 7:00 PM

oh i thought that we had to make readiness probe shortly for the first check given that it says in the documentation

Luis Fernandez

08/31/2022, 7:00 PM

Copy code

Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.

Luis Fernandez

08/31/2022, 7:00 PM

well i guess both of them have to be shorter

Xiang Fu

08/31/2022, 7:01 PM

it means the server is live but not able to serve traffic

Luis Fernandez

08/31/2022, 7:02 PM

so yea so i guess i can basically say both of them do the first check in 30s

Xiang Fu

08/31/2022, 7:02 PM

live means dns registered

Luis Fernandez

08/31/2022, 7:02 PM

and then give it a greater

failureThreshold

Luis Fernandez

08/31/2022, 7:03 PM

like, 50 or something every 15s

Luis Fernandez

08/31/2022, 7:03 PM

so that gives it 750s

Xiang Fu

08/31/2022, 7:07 PM

right

Xiang Fu

08/31/2022, 7:08 PM

depends on how long your normal server restart time

Xiang Fu

08/31/2022, 7:08 PM

make sure your server is live first then ready

Xiang Fu

08/31/2022, 7:08 PM

so dns won’t have problem

Luis Fernandez

08/31/2022, 7:08 PM

when we are not adding a new node is pretty fast

Xiang Fu

08/31/2022, 7:08 PM

yes

Luis Fernandez

08/31/2022, 7:08 PM

when we add a new node that it has to pull data from gcs and stuff it takes a minute

Xiang Fu

08/31/2022, 7:08 PM

that’s ok

Luis Fernandez

08/31/2022, 7:13 PM

so i’m gonna switch it up a bit then maybe do liveness in 30s and readiness in 60s

Luis Fernandez

08/31/2022, 7:17 PM

so like

Luis Fernandez

08/31/2022, 7:17 PM

Copy code

livenessProbe+: {
                      initialDelaySeconds: 30,
                      periodSeconds: 10,
                      failureThreshold, 50
                    },
                    readinessProbe+: {
                      initialDelaySeconds: 60,
                      periodSeconds: 10,
                      failureThreshold: 50,
                    },

Xiang Fu

08/31/2022, 7:17 PM

Yeah

Xiang Fu

08/31/2022, 7:18 PM

You can make liveness check more frequent

Xiang Fu

08/31/2022, 7:19 PM

so the server goes live before ready check

Luis Fernandez

08/31/2022, 7:24 PM

Copy code

livenessProbe+: {
                      initialDelaySeconds: 30,
                      periodSeconds: 5,
                      failureThreshold: 120,
                    },
                    readinessProbe+: {
                      initialDelaySeconds: 60,
                      periodSeconds: 10,
                      failureThreshold: 60,
                    },
                  }

Luis Fernandez

08/31/2022, 7:24 PM

going with this

Xiang Fu

08/31/2022, 7:28 PM

Right

Luis Fernandez

08/31/2022, 7:35 PM

i still got the exception for sometime

Luis Fernandez

08/31/2022, 7:35 PM

way faster to recover now

Luis Fernandez

08/31/2022, 7:35 PM

Copy code

[
  {
    "message": "java.net.UnknownHostException: pinot-server-0.pinot-server-headless.pinot.svc.cluster.local: Name or service not known\n\tat java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)\n\tat java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)\n\tat java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)\n\tat java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)",
    "errorCode": 425
  },
  {
    "message": "2 servers [pinot-server-1_R, pinot-server-0_R] not responded",
    "errorCode": 427
  }
]

Luis Fernandez

08/31/2022, 7:35 PM

but still getting this

Luis Fernandez

08/31/2022, 7:56 PM

what if i hid it every second 😄

Luis Fernandez

08/31/2022, 7:56 PM

the health end point

Luis Fernandez

08/31/2022, 7:56 PM

if i hit it*

Luis Fernandez

08/31/2022, 7:56 PM

the times are better now tho for sure to recover

Luis Fernandez

08/31/2022, 7:56 PM

but ideally i would like not to get the exception at all

Xiang Fu

08/31/2022, 7:59 PM

I think so

Xiang Fu

08/31/2022, 7:59 PM

do it in second

Xiang Fu

08/31/2022, 8:00 PM

I think you hit the time when it’s live but also ready

Luis Fernandez

08/31/2022, 8:00 PM

going with this

Luis Fernandez

08/31/2022, 8:00 PM

Copy code

livenessProbe+: {
                      initialDelaySeconds: 10,
                      periodSeconds: 1,
                      failureThreshold: 600,
                    },
                    readinessProbe+: {
                      initialDelaySeconds: 15,
                      periodSeconds: 1,
                      failureThreshold: 600,
                    },

Xiang Fu

08/31/2022, 8:00 PM

Right now you are on same health check

Luis Fernandez

08/31/2022, 8:01 PM

maybe i should have them both start at the same time (?)

Xiang Fu

08/31/2022, 8:01 PM

We plan to add new entry point to reflect the live and ready api

🍷 1

Luis Fernandez

08/31/2022, 8:01 PM

so 10s (?)

Xiang Fu

08/31/2022, 8:01 PM

Theoretically it doesn’t matter

Xiang Fu

08/31/2022, 8:02 PM

If you are unlucky enough, then your liveness and ready will pass at same time

Xiang Fu

08/31/2022, 8:03 PM

then you may still see a glitch

Luis Fernandez

08/31/2022, 8:03 PM

it has to be live first yes that’s why we perform that one first (?)

Luis Fernandez

08/31/2022, 8:03 PM

and then ready

Luis Fernandez

08/31/2022, 8:04 PM

is that coming for

0.11.0

?or for alter

Luis Fernandez

08/31/2022, 8:04 PM

later*

Xiang Fu

08/31/2022, 8:06 PM

I believe so

Xiang Fu

08/31/2022, 8:06 PM

cc: @Rong R

Xiang Fu

08/31/2022, 8:10 PM

In 0.11

Open in Slack

Previous Next