Apache Pinot #troubleshooting

Join Slack

Neha Pawar

05/18/2020, 11:53 PM

btw, this answer had been added to the FAQ, with explanation @srisudha @Jackie

👍 2

Jackie

05/18/2020, 11:53 PM

With replica-group, only half of the servers should be queried

srisudha

05/18/2020, 11:56 PM

As per faq there is no config requires for replica group. Just adding servers as multiple would do

srisudha

05/18/2020, 11:56 PM

I will try that

srisudha

05/18/2020, 11:57 PM

And currently the qps hitting each server is same as the total tps started meaning 10k rps is hitting almost every server. Are you saying if replica group works not all servers will see 10k rps but way lesser?

Jackie

05/18/2020, 11:58 PM

If you use replica-group, each server should see 5K rps

srisudha

05/19/2020, 12:00 AM

Okay. That will be great..

srisudha

05/19/2020, 12:01 AM

I will get back on tbjs

srisudha

05/19/2020, 4:39 PM

I checked this, though we have two servers all config same as above i still see 10k requests hitting each server.

srisudha

05/19/2020, 4:42 PM

Another observation v initially had 3 servers and then i upgraded helm for 2 servers. After that, i executed balancing api.. The number of server pods were 2 but using swagger api i could still see the number of instances were always 3. Post executing re balancing api the number of instances were 3 and segments were distributed among all 3. I had to manually delete the instance for it to. Work..

srisudha

05/19/2020, 4:43 PM

http://localhost:9000/tables/jeepusermap/rebalance?type=REALTIME&dryRun=false&reassignInstances=true&includeConsuming=true&bootstrap=true&downtime=true&minAvailableReplicas=1&bestEfforts=false

srisudha

05/19/2020, 4:43 PM

Thats the re balancing api url used.

Jackie

05/19/2020, 7:34 PM

@srisudha Have you untagged the third server?

srisudha

05/19/2020, 8:29 PM

Un tagged how to do that?

Jackie

05/19/2020, 11:34 PM

You need to use the tenant API to remove the current tenant and create a new tenant with the updated number of servers

Jackie

05/19/2020, 11:35 PM

Or if you didn't create any tenant (if you are using the default tenant), then you need to remove the instance from the cluster

srisudha

05/20/2020, 12:45 AM

Thanks @Jackie .. And for the replica group, we dont see the load getting split across the 2 server s. Let us know if there is something missing.

Jackie

05/20/2020, 12:48 AM

@srisudha In order to split the load, you should enable the replica-group based routing

Jackie

05/20/2020, 12:49 AM

You can change your table config to include the routing config as following:

Copy code

"routing": {
  "instanceSelectorType": "replicaGroup"
}

srisudha

05/20/2020, 5:51 PM

Thanks @Jackie this worked well for us...!

Kishore G

05/20/2020, 5:53 PM

Can you please write a small section on using replica group

Kishore G

05/20/2020, 5:53 PM

We can add it to the docs

srisudha

05/20/2020, 5:53 PM

Surely..

srisudha

05/20/2020, 5:54 PM

Where should i add it?

Buchi Reddy

05/21/2020, 5:49 PM

Okay reposting here: Hi all, Pinot broker is logging a warning that it can't find

brokerResource

and then it fails to find servers hosting a segement, though the ideal state and external view clear says the segment is hosted by a server. This setup has only one controller, broker and server running in k8s.

Copy code

020/05/21 16:35:00.579 WARN [ParticipantHealthReportTask] [main] ParticipantHealthReportTimerTask already stopped
2020/05/21 16:35:04.476 WARN [ConfigAccessor] [ZkClient-EventThread-27-zookeeper.test.svc.cluster.local:2181/pinot] No config found at /test-views/CONFIGS/RESOURCE/brokerResource
2020/05/21 16:35:04.503 WARN [CallbackHandler] [main] Callback handler received event in wrong order. Listener: org.apache.helix.messaging.handling.HelixTaskExecutor@69d1227f, path: /test-views/INSTANCES/Broker_pinot-broker-0.pinot-broker.test.svc.cluster.local_8099/MESSAGES, expected types: [CALLBACK, FINALIZE] but was INIT
2020/05/21 16:35:04.607 INFO [HelixBrokerStarter] [main] Registering service status handler
2020/05/21 16:35:26.492 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting segment: myView__0__0__20200519T1722Z for table: myView_REALTIME (all online instances: [] are disabled)

who registers the

brokerResource

and when could that be missing? UPDATE: Now the setup is fine and broker finds the servers for segments. But want to understand the root cause and how to avoid such issues.

Mayank

05/21/2020, 5:50 PM

How did you solve it

Buchi Reddy

05/21/2020, 5:50 PM

I didn't do anything. It started working on its own. Please note i'm running with low resources on my laptop so not sure if that is the issue here.

Mayank

05/21/2020, 5:51 PM

This can happen if there’s typo in table name in the query

Mayank

05/21/2020, 5:51 PM

I take it that was not the case?

Buchi Reddy

05/21/2020, 5:51 PM

That's not a possibility here because I didn't change anything and it started working