Apache Pinot #general

Join Slack

Alex

11/21/2019, 2:41 AM

Will do once back home

Alex

11/21/2019, 2:43 AM

Our setup is straight from master... :)

Alex

11/21/2019, 2:58 AM

zookeper spitting into logs:

Alex

11/21/2019, 2:58 AM

Copy code

2019-11-21 02:57:42,284 [myid:1] - INFO  [pinot-zookeeper-0.pinot-zookeeper-headless.pinot.svc.cluster.local/10.72.2.125:3888:QuorumCnxManager$Listener@888] - Received connection request /10.72.5.54:42472
2019-11-21 02:57:42,285 [myid:1] - WARN  [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1176] - Connection broken for id 5135603447297303924, my id = 1, error =
<http://java.io|java.io>.IOException: Received packet with invalid packet: 1919509363
	at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1163)
2019-11-21 02:57:42,285 [myid:1] - WARN  [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1179] - Interrupting SendWorker
2019-11-21 02:57:42,286 [myid:1] - WARN  [SendWorker:5135603447297303924:QuorumCnxManager$SendWorker@1092] - Interrupted while waiting for message on queue
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)

Alex

11/21/2019, 2:58 AM

servers:

Alex

11/21/2019, 2:59 AM

Copy code

019/11/21 01:17:29.611 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/0d38ff9a-f613-48d9-97af-216dea6e6821=-101, /pinot/INSTANCES/Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/edb08cc7-967e-4875-8c91-dc9e226e9314=-101}

Alex

11/21/2019, 2:59 AM

on one server

Alex

11/21/2019, 3:00 AM

another one looks ok

Alex

11/21/2019, 3:01 AM

third one has a lot of:

Alex

11/21/2019, 3:01 AM

Copy code

2019/11/21 01:30:59.494 WARN [HelixStateMachineEngine] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to create msg-handler because cannot find stateModelFactory for model: SegmentOnlineOfflineStateModel using factoryName: DEFAULT for resource: flattened_orders_hours_REALTIME
2019/11/21 01:30:59.495 WARN [HelixStateMachineEngine] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to create msg-handler because cannot find stateModelFactory for model: SegmentOnlineOfflineStateModel using factoryName: DEFAULT for resource: flattened_orders_hours_REALTIME
2019/11/21 01:31:00.396 WARN [ConfigAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] No config found at /pinot/CONFIGS/RESOURCE/flattened_orders_hours_REALTIME
2019/11/21 01:31:14.450 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/8228a6ef-ad4f-4e27-b530-ac6cffcf4bfd=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/d95fb4ec-e649-4856-83d9-11a5af89db5c=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/9fe9fc6a-8c14-46cc-9e1c-db2af680b225=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/ea23bd5f-1e3a-4e80-ac34-95d626d1cff7=-101}
2019/11/21 01:31:18.018 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/cd380a2a-3ac8-45b2-99af-8b039499ed09=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/7e63c21b-2e3d-44e3-9e83-6ec5a23ab7cf=-101}
2019/11/21 01:31:23.400 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/051aa932-f392-49c7-a1dd-63aa468727c4=-101}

Alex

11/21/2019, 9:30 PM

any idea how to troubleshoot kafka consumer failures. Easy to repro in our setup: SELECT bla, bla1 from flattened_orders_hours order by bla1 and all servers go into this loop

Kishore G

11/21/2019, 9:31 PM

whats the mode of kafka consumer

Alex

11/22/2019, 4:05 PM

Low level

Jackie

11/22/2019, 9:32 PM

Submitted a pr to remove the connection-pool based broker routing: https://github.com/apache/incubator-pinot/pull/4850 By default, Pinot broker is using a single connection for each server to route the queries in async (non-blocking) fashion This pr removes the support of connection-pool based routing (blocking call), which has been proved to be consuming more CPU/memory resources and can drop requests with short burst traffic This pr also removes the

pinot-transport

package I'll hold the pr for a week before merging it. Please let me know if you have any concern on the change, thanks!

Seunghyun

11/22/2019, 10:11 PM

<!here> We just cut the second official apache release

0.2.0

The release can be downloaded at: https://pinot.apache.org/download The release note is available at: https://github.com/apache/incubator-pinot/releases Thank you @User for being the release manager 🙂

👏 6

lsabi

11/22/2019, 10:12 PM

🎉🎉🎉🎉🎉🎉🎉🎉🎉

Subbu Subramaniam

11/22/2019, 10:23 PM

Between read-the-docs (for 0.2.0) and release notes, it should be clear what is in the release.

Kishore G

11/22/2019, 10:24 PM

Congratulations!. Thanks @User.

Subbu Subramaniam

11/22/2019, 11:42 PM

@User I am not sure if you got past your problem. From what I could gather, you are using LLC consumer, and your queries are timing out. Is this right? If your queries are timing out, it could be because you have too few servers. How many segments do you have? Does your query search all segments? I think you had a query like

SELECT f1, f2 FROM table

The default limit on this is 10 rows, and it shoukd return quickly.

Subbu Subramaniam

11/22/2019, 11:42 PM

Is this because of GC that your servers are timing out?

Kishore G

11/22/2019, 11:45 PM

@User query was

Copy code

select f1,f2 from T order by f2 top 5

Alex

11/22/2019, 11:47 PM

@User queries are timing out, and consumers are going into restart loop agter the first timeout

Subbu Subramaniam

11/22/2019, 11:49 PM

Are they restarting the segment from the beginning? That hsould not happen unless the server restarts

Subbu Subramaniam

11/22/2019, 11:49 PM

thanks @User for the clarification

Kishore G

11/22/2019, 11:50 PM

timing out is expected because the default timeout is set to 10 second and GC is also expected bcos of the amount of data its bringing into the heap (they can potentially control this via numMaxGroupsLimit)

Kishore G

11/22/2019, 11:50 PM

but whats concerning is real time consumption stops

Kishore G

11/22/2019, 11:50 PM

and only way to recover is to restart the nodes

Subbu Subramaniam

11/22/2019, 11:51 PM

yes, that is concerning. I am guesing that there is a zk disconnect, and so we get antoher state transition for OFFLINE to CONSUMING. But then we return silently if we already have a CONSUMING segment in that case.

Subbu Subramaniam

11/22/2019, 11:51 PM

I have seen that happen (unless new bugs have been introduced 🙂

Subbu Subramaniam

11/22/2019, 11:52 PM

The other part is that Kafka may experience some disconnects, and may re-connect. But the existing data shoud not be discarded.