Apache Pinot #general

Subbu Subramaniam

02/27/2019, 12:20 AM

where

_capacity

is the number of rows we expect to consume. Shuld never be 0 unless the controller put 0 rows, or somehow we decided we need to consume 0 rows for the segment.

Subbu Subramaniam

02/27/2019, 12:22 AM

To read the segment stats file, run RealtimeSegmentStatsHistory,main() giving it the file name

Ananth Packkildurai

02/27/2019, 12:24 AM

oh, I'm seeing now

Copy code

Creating new stream consumer, reason: Idle for too long
Connecting to bootstrap host <http://metrics-kafka-02d885b15e8e33fbd.nebula.tinyspeck.com:9092|metrics-kafka-02d885b15e8e33fbd.nebula.tinyspeck.com:9092> for topic clog-json
Switching from state CONNECTING_TO_BOOTSTRAP_NODE to state CONNECTED_TO_BOOTSTRAP_NODE for topic clog-json
Switching from state CONNECTED_TO_BOOTSTRAP_NODE to state FETCHING_LEADER_INFORMATION for topic clog-json
Located leader broker <http://metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092|metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092> for topic clog-json, connecting to it.
Switching from state FETCHING_LEADER_INFORMATION to state CONNECTING_TO_PARTITION_LEADER for topic clog-json
Trying to fetch leader host and port: <http://metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092|metrics-kafka-0ecd28d600ff12a21.nebula.tinyspeck.com:9092> for topic clog-json
Switching from state CONNECTING_TO_PARTITION_LEADER to state CONNECTED_TO_PARTITION_LEADER for topic clog-json
Consumed 0 events from (rate:0.0/s), currentOffset=2590122542, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
Consumed 0 events from (rate:0.0/s), currentOffset=2586172612, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0

Ananth Packkildurai

02/27/2019, 12:25 AM

😕

numRowsConsumedSoFar=0

, I guess it is an impact of the previous error. The events flowing through kafka so the upstream is looking good.

Subbu Subramaniam

02/27/2019, 12:32 AM

we poll kafka for rows, and found no rows there -- is what that log indicates. Since it prits the offset, you can hopefully check the kafka topic to see if a later offset is avaialble in that partition, and if so, why does that particular broker not have it

Subbu Subramaniam

02/27/2019, 1:20 AM

@User can u check the segment metadata to see if the number of rows for this segment is non-zero? Also, check the table config to see if the number of rows threshold is set to 0

Kishore G

02/28/2019, 12:41 AM

@User were you able to get past this issue?

Kishore G

02/28/2019, 12:42 AM

@User is there a dumpRealtimeInfo <table> tool in Pinot

Kishore G

02/28/2019, 12:42 AM

we should have one that users can run that will help in remote debugging

Ananth Packkildurai

02/28/2019, 12:43 AM

nope, it start to consume for sometime, then loop to

CONNECTED_TO_PARTITION_LEADER

and

CONNECTED_TO_BOOTSTRAP_NODE

Ananth Packkildurai

02/28/2019, 12:44 AM

I can't still figure it out why it loops.

Kishore G

02/28/2019, 12:48 AM

can you print the current offset of the kafka topic?

Subbu Subramaniam

02/28/2019, 12:53 AM

There is no

dumpRealtimeInfo

command. What exactly are you looking to print?

Kishore G

02/28/2019, 12:54 AM

the questions you are asking Ananth

Kishore G

02/28/2019, 12:54 AM

latest segment metadata from ZK, table config, kafka offsets etc

Subbu Subramaniam

02/28/2019, 12:54 AM

can be got by looking at the logs

Ananth Packkildurai

02/28/2019, 12:55 AM

oh, The Getoffset tool somehow broken the kafka cluster, but I did examine the timestamp of the message

Ananth Packkildurai

02/28/2019, 12:56 AM

it is constantly 3 hours lag, So I guess the upstream has some issues that ingest an old event timestamp

Ananth Packkildurai

02/28/2019, 12:56 AM

the topic has a two hour retention period, could it be a reason that it stop consuming the data?

Kishore G

02/28/2019, 12:58 AM

I see

Kishore G

02/28/2019, 12:58 AM

yeah, first thing would be to find the earliest and latest offsets available in kafka (any partition will do)

Subbu Subramaniam

02/28/2019, 12:59 AM

It will always consume data. It is possible that retention kicks in after the segment is completed.

Ananth Packkildurai

02/28/2019, 1:02 AM

yes, but that should not stop the consumers to consume data isn't it?

Ananth Packkildurai

02/28/2019, 1:12 AM

I will try to fix the Kafka tooling and do more debugging there are no obvious errors causes more confusion

Subbu Subramaniam

02/28/2019, 1:16 AM

Sorry, remove the CONFIGS in the previous msg

Ananth Packkildurai

02/28/2019, 3:35 AM

other interesting log

Copy code

/var/log/pinot-server/current:26 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762 type: CALLBACK
/var/log/pinot-server/current:Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762, watchChild: false
/var/log/pinot-server/current:Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762
/var/log/pinot-server/current:26 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.153.112_8098/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@3cd3e762 type: CALLBACK Took: 1ms

Ananth Packkildurai

02/28/2019, 3:37 AM

☝️ is also looping, essentially pinot create segment for sometime and I'm getting

No Messages to process

then the loop starts.

Ananth Packkildurai

02/28/2019, 3:37 AM

seems like coming from https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java#L754

Ananth Packkildurai

02/28/2019, 3:46 AM

other suspect

Copy code

Exception in thread "clog_v3__3__19__20190228T0338Z" java.lang.RuntimeException: Not yet created#012	at com.linkedin.pinot.server.realtime.ControllerLeaderLocator.getInstance(ControllerLeaderLocator.java:70)#012	at com.linkedin.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.createSegmentCompletionUrl(ServerSegmentCompletionProtocolHandler.java:144)#012	at com.linkedin.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.segmentStoppedConsuming(ServerSegmentCompletionProtocolHandler.java:136)#012	at com.linkedin.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.postStopConsumedMsg(LLRealtimeSegmentDataManager.java:800)#012	at com.linkedin.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:562)#012	at java.lang.Thread.run(Thread.java:748)

Ananth Packkildurai

02/28/2019, 3:52 AM

and I'm seeing frequently https://github.com/apache/incubator-pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java#L571