https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • a

    Alex

    11/21/2019, 2:41 AM
    Will do once back home
  • a

    Alex

    11/21/2019, 2:43 AM
    Our setup is straight from master... :)
  • a

    Alex

    11/21/2019, 2:58 AM
    zookeper spitting into logs:
  • a

    Alex

    11/21/2019, 2:58 AM
    Copy code
    2019-11-21 02:57:42,284 [myid:1] - INFO  [pinot-zookeeper-0.pinot-zookeeper-headless.pinot.svc.cluster.local/10.72.2.125:3888:QuorumCnxManager$Listener@888] - Received connection request /10.72.5.54:42472
    2019-11-21 02:57:42,285 [myid:1] - WARN  [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1176] - Connection broken for id 5135603447297303924, my id = 1, error =
    <http://java.io|java.io>.IOException: Received packet with invalid packet: 1919509363
    	at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1163)
    2019-11-21 02:57:42,285 [myid:1] - WARN  [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1179] - Interrupting SendWorker
    2019-11-21 02:57:42,286 [myid:1] - WARN  [SendWorker:5135603447297303924:QuorumCnxManager$SendWorker@1092] - Interrupted while waiting for message on queue
    java.lang.InterruptedException
    	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
    	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
    	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
    	at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
    	at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
    	at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)
  • a

    Alex

    11/21/2019, 2:58 AM
    servers:
  • a

    Alex

    11/21/2019, 2:59 AM
    Copy code
    019/11/21 01:17:29.611 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/0d38ff9a-f613-48d9-97af-216dea6e6821=-101, /pinot/INSTANCES/Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/edb08cc7-967e-4875-8c91-dc9e226e9314=-101}
  • a

    Alex

    11/21/2019, 2:59 AM
    on one server
  • a

    Alex

    11/21/2019, 3:00 AM
    another one looks ok
  • a

    Alex

    11/21/2019, 3:01 AM
    third one has a lot of:
  • a

    Alex

    11/21/2019, 3:01 AM
    Copy code
    2019/11/21 01:30:59.494 WARN [HelixStateMachineEngine] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to create msg-handler because cannot find stateModelFactory for model: SegmentOnlineOfflineStateModel using factoryName: DEFAULT for resource: flattened_orders_hours_REALTIME
    2019/11/21 01:30:59.495 WARN [HelixStateMachineEngine] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to create msg-handler because cannot find stateModelFactory for model: SegmentOnlineOfflineStateModel using factoryName: DEFAULT for resource: flattened_orders_hours_REALTIME
    2019/11/21 01:31:00.396 WARN [ConfigAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] No config found at /pinot/CONFIGS/RESOURCE/flattened_orders_hours_REALTIME
    2019/11/21 01:31:14.450 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/8228a6ef-ad4f-4e27-b530-ac6cffcf4bfd=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/d95fb4ec-e649-4856-83d9-11a5af89db5c=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/9fe9fc6a-8c14-46cc-9e1c-db2af680b225=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/ea23bd5f-1e3a-4e80-ac34-95d626d1cff7=-101}
    2019/11/21 01:31:18.018 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/cd380a2a-3ac8-45b2-99af-8b039499ed09=-101, /pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/7e63c21b-2e3d-44e3-9e83-6ec5a23ab7cf=-101}
    2019/11/21 01:31:23.400 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-2.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/051aa932-f392-49c7-a1dd-63aa468727c4=-101}
  • a

    Alex

    11/21/2019, 9:30 PM
    any idea how to troubleshoot kafka consumer failures. Easy to repro in our setup: SELECT bla, bla1 from flattened_orders_hours order by bla1 and all servers go into this loop
  • k

    Kishore G

    11/21/2019, 9:31 PM
    whats the mode of kafka consumer
  • a

    Alex

    11/22/2019, 4:05 PM
    Low level
  • j

    Jackie

    11/22/2019, 9:32 PM
    Submitted a pr to remove the connection-pool based broker routing: https://github.com/apache/incubator-pinot/pull/4850 By default, Pinot broker is using a single connection for each server to route the queries in async (non-blocking) fashion This pr removes the support of connection-pool based routing (blocking call), which has been proved to be consuming more CPU/memory resources and can drop requests with short burst traffic This pr also removes the
    pinot-transport
    package I'll hold the pr for a week before merging it. Please let me know if you have any concern on the change, thanks!
  • s

    Seunghyun

    11/22/2019, 10:11 PM
    <!here> We just cut the second official apache release
    0.2.0
    The release can be downloaded at: https://pinot.apache.org/download The release note is available at: https://github.com/apache/incubator-pinot/releases Thank you @User for being the release manager 🙂
    👏 6
  • l

    lsabi

    11/22/2019, 10:12 PM
    🎉🎉🎉🎉🎉🎉🎉🎉🎉
  • s

    Subbu Subramaniam

    11/22/2019, 10:23 PM
    Between read-the-docs (for 0.2.0) and release notes, it should be clear what is in the release.
  • k

    Kishore G

    11/22/2019, 10:24 PM
    Congratulations!. Thanks @User.
  • s

    Subbu Subramaniam

    11/22/2019, 11:42 PM
    @User I am not sure if you got past your problem. From what I could gather, you are using LLC consumer, and your queries are timing out. Is this right? If your queries are timing out, it could be because you have too few servers. How many segments do you have? Does your query search all segments? I think you had a query like
    SELECT f1, f2 FROM table
    The default limit on this is 10 rows, and it shoukd return quickly.
  • s

    Subbu Subramaniam

    11/22/2019, 11:42 PM
    Is this because of GC that your servers are timing out?
  • k

    Kishore G

    11/22/2019, 11:45 PM
    @User query was
    Copy code
    select f1,f2 from T order by f2 top 5
  • a

    Alex

    11/22/2019, 11:47 PM
    @User queries are timing out, and consumers are going into restart loop agter the first timeout
  • s

    Subbu Subramaniam

    11/22/2019, 11:49 PM
    Are they restarting the segment from the beginning? That hsould not happen unless the server restarts
  • s

    Subbu Subramaniam

    11/22/2019, 11:49 PM
    thanks @User for the clarification
  • k

    Kishore G

    11/22/2019, 11:50 PM
    timing out is expected because the default timeout is set to 10 second and GC is also expected bcos of the amount of data its bringing into the heap (they can potentially control this via numMaxGroupsLimit)
  • k

    Kishore G

    11/22/2019, 11:50 PM
    but whats concerning is real time consumption stops
  • k

    Kishore G

    11/22/2019, 11:50 PM
    and only way to recover is to restart the nodes
  • s

    Subbu Subramaniam

    11/22/2019, 11:51 PM
    yes, that is concerning. I am guesing that there is a zk disconnect, and so we get antoher state transition for OFFLINE to CONSUMING. But then we return silently if we already have a CONSUMING segment in that case.
  • s

    Subbu Subramaniam

    11/22/2019, 11:51 PM
    I have seen that happen (unless new bugs have been introduced 🙂
  • s

    Subbu Subramaniam

    11/22/2019, 11:52 PM
    The other part is that Kafka may experience some disconnects, and may re-connect. But the existing data shoud not be discarded.
1...101102103...160Latest