Hi. I’ve found an interesting thing but not sure i...
# troubleshooting
f
Hi. I’ve found an interesting thing but not sure it’s a good practice 😕 I’m still in POC and my kafka have crashed -> So my new message offset are reseted at 0. But pinot keep looking for older offsets. I’ve searched and found that It was not possible to reset an offset on pinot side. I’ve tried several things un-sucessfully before doing the following : 1 Disable the realtime table 2 Find out in zookeeper the consuming segments 3 Set the offset values to 0 for the segment of step 2 4 Enable the table And the consumption resume as I expected without data loss and only a small downtime. Is that a good practice or a least a workaround before new dev around this topic as read in the designs doc ? 😉
m
Thanks for sharing @francoisa, yes if the upstream kafka offsets become inconsistent we run into these issues. In your specific case (kafka crash), you could unblock by doing so, but note that Pinot won’t know if there’s any data loss or duplication. I think we have an issue being discussed where we want to at least allow the consumption to begin without having to do the ZK edits. cc: @Navina R @Neha Pawar
s
@francoisa for now, the only thing you can do is to drop your table and restart consumption. If you need the data then you may have to save away the segments and upload them again (and that is a little tricky as well). We are working on a design to overcome such issues, the design is under review. cc: @Sajjad Moradi
n
@Mayank Yes. It is similar to the offset out of range exception https://github.com/apache/pinot/issues/8219 @francoisa: In your case, is your older invalid message no longer in the kafka topic or is it still present and you would like to reset to a different offset?
m
I think the recent cases are we have seen are issues where users understand and are OK with temporary data loss from Pinot, but want a way to simply restart the consumption on best effort case, without having to delete the entire table in Pinot.
f
@Navina In my case kafka is reseting his offset from 0. So older messages are gone 🙂
n
@francoisa I see. Ok. in that case, the solution should be similar to the one dealing with OOR. I think by default, the kafka consumer's
reset.offset
is set to
LATEST
. You could override it to use
EARLIEST
. We are still discussing if this should be a config driven behavior or use explicit kafka apis to handle resets. Please follow that issue for progress on this work!
n
fyi @francoisa this is the proposed solution https://github.com/apache/pinot/pull/8309 which will help in this case
i dont think it will help in your case though.. even if we make kafka consumer reset to earliest, the pinot consumer will filter out the records because they’re out of range from pinot perspective. in your case, manually editing zk is one option. deleting and restarting is safest option. We should add an API to help with such resets
n
ok. so this is a manual "re-position" use-case and cannot be handled as the OOR case
1