Hello, I've got an issue with a realtime table whi...
# troubleshooting
d
Hello, I've got an issue with a realtime table which is consuming from a topic with 16 partitions. Pinot is consuming from all partitions except 1 and I can't find issues in the logs. Is there a way to force pinot consuming from that partition? I've tried rebalancing the servers and reloading all segments but it still won't consume from this one partition
s
Can you check if there are CONSUMING segments for all your partitions in ZK? You can use the controller UI to check that. Here's an example. It's under IDEALSTATES -> <tableName>_REALTIME. You should ideally have one segment per partition with state as "CONSUMING". If there are some partitions for which you don't have consuming segments, you might have to manually create them to resume consumption.
The key in mapFields in of the format <tableName>__<partitionGroupId>___<sequenceNumber>___<dateTime>
k
Also, you can trigger
RealtimeSegmentValidation
task to detect new partitions. This can be done via API call to controller
curl -X GET "<http://localhost:9000/periodictask/run?taskname=RealtimeSegmentValidationManager&tableName=your-table-name>" -H "accept: application/json"
d
I had checked ideal states to confirm the table wasn't consuming, there aren't entries in ZK for this partition. I would rather avoid doing operations at this level
k
@Navina R can you help here. What can be the cause of only one partition not showing up?
d
I've run the the segment validation task but still no luck
s
Was any segment delete command run for this partition's segment @Dan DC?
Or are there any segments in OFFLINE state?
d
There are no segments for this partition nor they've been deleted
I'm still not able to get this sorted, if anyone has any pointers I'll appreciate it
m
This seems quite odd. Are you saying that the partition always existed, but Pinot never recognized it?
Or did something change (eg new partook added, or server I tagged segment deleted etc).
d
The partition always existed and nothing has changed in this table or the topic but I know in this env sometimes the kubernetes nodes are recreated, not sure if that's got anything to do with this issue. I'm trying to dig in the logs
m
Hmmm that really does not make sense. Do you have monitoring enabled via Prometheus? If so, letโ€™s check if the partition shows up in the consumption metric
From what you are describing, seems like Pinot does not even know there is partition (assuming no logs about the partition, and no errors).
s
do you know what that metric is @Mayank - i can't see it for the LLC
so we've looked at pinot_server_llcPartitionConsuming_value and that metric is missing for partition 1 of that topic
we have replication of 2 and it is missing on both servers
m
Yes that is the metric. That is really odd. Any ideas @Neha Pawar ?
n
Are we saying there is absolutely no record of this partition in ZK metadata? That's strange!
s
yeah exactly - we are at a bit of a loss as well
m
Can we try to create a new table with exactly same config (except table name)? Just to see if the problem is reproducible consistently? And if so, we can look at logs to see if they reveal anything
n
I don't know how many servers you have. Have you considered looking into the consumer states for this table? Maybe with a thread dump? If you have more than one table on servers then the dump is likely to get noisy
๐Ÿ‘ 1
Idea is to see if the Kafka consumer is not stalled and actually polling for messages
s
we have created a new table this morning @Mayank and the partition is found now - we do only have 7 days retention on kafka so some messages will have expired
we have set up a reconciliation between Pinot and Kafka so we will keep an eye on if the issue arises again
m
@Stuart Coleman for my understanding, in the previous case, the partition never existed since the table creation right? Or did it suddenly disappear?
s
correct, never existed
๐Ÿ‘ 1
m
For the new table, you can set monitoring and alerting on the metric you mentioned.
๐Ÿ‘ 2