https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • s

    Subbu Subramaniam

    02/27/2019, 12:20 AM
    where
    _capacity
    is the number of rows we expect to consume. Shuld never be 0 unless the controller put 0 rows, or somehow we decided we need to consume 0 rows for the segment.
  • s

    Subbu Subramaniam

    02/27/2019, 12:22 AM
    To read the segment stats file, run RealtimeSegmentStatsHistory,main() giving it the file name
  • a

    Ananth Packkildurai

    02/27/2019, 12:24 AM
    oh, I'm seeing now
    Copy code
    Creating new stream consumer, reason: Idle for too long
    Connecting to bootstrap host <http://metrics-kafka-02d885b15e8e33fbd.nebula.tinyspeck.com:9092|metrics-kafka-02d885b15e8e33fbd.nebula.tinyspeck.com:9092> for topic clog-json
    Switching from state CONNECTING_TO_BOOTSTRAP_NODE to state CONNECTED_TO_BOOTSTRAP_NODE for topic clog-json
    Switching from state CONNECTED_TO_BOOTSTRAP_NODE to state FETCHING_LEADER_INFORMATION for topic clog-json
    Located leader broker <http://metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092|metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092> for topic clog-json, connecting to it.
    Switching from state FETCHING_LEADER_INFORMATION to state CONNECTING_TO_PARTITION_LEADER for topic clog-json
    Trying to fetch leader host and port: <http://metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092|metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092> for topic clog-json
    Switching from state CONNECTING_TO_PARTITION_LEADER to state CONNECTED_TO_PARTITION_LEADER for topic clog-json
    Consumed 0 events from (rate:0.0/s), currentOffset=2590122542, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
    Consumed 0 events from (rate:0.0/s), currentOffset=2586172612, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
  • a

    Ananth Packkildurai

    02/27/2019, 12:25 AM
    😕
    numRowsConsumedSoFar=0
    , I guess it is an impact of the previous error. The events flowing through kafka so the upstream is looking good.
  • s

    Subbu Subramaniam

    02/27/2019, 12:32 AM
    we poll kafka for rows, and found no rows there -- is what that log indicates. Since it prits the offset, you can hopefully check the kafka topic to see if a later offset is avaialble in that partition, and if so, why does that particular broker not have it
  • s

    Subbu Subramaniam

    02/27/2019, 1:20 AM
    @User can u check the segment metadata to see if the number of rows for this segment is non-zero? Also, check the table config to see if the number of rows threshold is set to 0
  • k

    Kishore G

    02/28/2019, 12:41 AM
    @User were you able to get past this issue?
  • k

    Kishore G

    02/28/2019, 12:42 AM
    @User is there a dumpRealtimeInfo <table> tool in Pinot
  • k

    Kishore G

    02/28/2019, 12:42 AM
    we should have one that users can run that will help in remote debugging
  • a

    Ananth Packkildurai

    02/28/2019, 12:43 AM
    nope, it start to consume for sometime, then loop to
    CONNECTED_TO_PARTITION_LEADER
    and
    CONNECTED_TO_BOOTSTRAP_NODE
  • a

    Ananth Packkildurai

    02/28/2019, 12:44 AM
    I can't still figure it out why it loops.
  • k

    Kishore G

    02/28/2019, 12:48 AM
    can you print the current offset of the kafka topic?
  • s

    Subbu Subramaniam

    02/28/2019, 12:53 AM
    There is no
    dumpRealtimeInfo
    command. What exactly are you looking to print?
  • k

    Kishore G

    02/28/2019, 12:54 AM
    the questions you are asking Ananth
  • k

    Kishore G

    02/28/2019, 12:54 AM
    latest segment metadata from ZK, table config, kafka offsets etc
  • s

    Subbu Subramaniam

    02/28/2019, 12:54 AM
    can be got by looking at the logs
  • a

    Ananth Packkildurai

    02/28/2019, 12:55 AM
    oh, The Getoffset tool somehow broken the kafka cluster, but I did examine the timestamp of the message
  • a

    Ananth Packkildurai

    02/28/2019, 12:56 AM
    it is constantly 3 hours lag, So I guess the upstream has some issues that ingest an old event timestamp
  • a

    Ananth Packkildurai

    02/28/2019, 12:56 AM
    the topic has a two hour retention period, could it be a reason that it stop consuming the data?
  • k

    Kishore G

    02/28/2019, 12:58 AM
    I see
  • k

    Kishore G

    02/28/2019, 12:58 AM
    yeah, first thing would be to find the earliest and latest offsets available in kafka (any partition will do)
  • s

    Subbu Subramaniam

    02/28/2019, 12:59 AM
    It will always consume data. It is possible that retention kicks in after the segment is completed.
  • a

    Ananth Packkildurai

    02/28/2019, 1:02 AM
    yes, but that should not stop the consumers to consume data isn't it?
  • a

    Ananth Packkildurai

    02/28/2019, 1:12 AM
    I will try to fix the Kafka tooling and do more debugging there are no obvious errors causes more confusion
  • s

    Subbu Subramaniam

    02/28/2019, 1:16 AM
    Sorry, remove the CONFIGS in the previous msg
  • a

    Ananth Packkildurai

    02/28/2019, 3:35 AM
    other interesting log
    Copy code
    /var/log/pinot-server/current:26 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762 type: CALLBACK
    /var/log/pinot-server/current:Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762, watchChild: false
    /var/log/pinot-server/current:Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762
    /var/log/pinot-server/current:26 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762 type: CALLBACK Took: 1ms
  • a

    Ananth Packkildurai

    02/28/2019, 3:37 AM
    ☝️ is also looping, essentially pinot create segment for sometime and I'm getting
    No Messages to process
    then the loop starts.
  • a

    Ananth Packkildurai

    02/28/2019, 3:37 AM
    seems like coming from https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java#L754
  • a

    Ananth Packkildurai

    02/28/2019, 3:46 AM
    other suspect
    Copy code
    Exception in thread "clog_v3__3__19__20190228T0338Z" java.lang.RuntimeException: Not yet created#012	at com.linkedin.pinot.server.realtime.ControllerLeaderLocator.getInstance(ControllerLeaderLocator.java:70)#012	at com.linkedin.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.createSegmentCompletionUrl(ServerSegmentCompletionProtocolHandler.java:144)#012	at com.linkedin.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.segmentStoppedConsuming(ServerSegmentCompletionProtocolHandler.java:136)#012	at com.linkedin.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.postStopConsumedMsg(LLRealtimeSegmentDataManager.java:800)#012	at com.linkedin.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:562)#012	at java.lang.Thread.run(Thread.java:748)
  • a

    Ananth Packkildurai

    02/28/2019, 3:52 AM
    and I'm seeing frequently https://github.com/apache/incubator-pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java#L571
1...686970...160Latest