Hi, we're seeing high disk usage on the minions fr...
# troubleshooting
t
Hi, we're seeing high disk usage on the minions from the ReatimeToOfflineSegments task. Is it expected that pinot will keep the segments on disk after finishing the task? Also we noticed that when running
lsof
, Pinot is keeping a large number of deleted segments open still, which might be some memory bug?
m
@Xiaobing ^^
x
hi Tiger, thanks for debugging this. The task shouldn’t keep any input segments or newly generated segments on disk after completion. If so, that’d be a bug. could you help open an issue about this, and your findings with lsof. Thanks!
Pinot is keeping a large number of deleted segments open still
perhaps some cleanup code is not put in a finally{} block
t
@Xiaobing if the bucket time period for the ReatimeToOfflineSegments task is set to one day, would pinot try to download that entire day of segments before running the task? Or would it be done in smaller batches?
x
iirc, the current implementation would download all segments before processing them
t
I see, and as the offline segments are created, are those (and the realtime ones) immediately deleted off the minion disk or would that deletion occur after all the data has been processed for that day?
x
it’s deleted after segment got generated and compressed (link to code) and did you see logs like in your case?
Copy code
LOGGER.warn("Failed to delete input segment: {}", inputSegmentDir.getAbsolutePath());
t
haven't seen any log messages like that. But given that it downloads all the segments, I'm thinking it's possible the disk usage is just coming from that