Nick M

05/25/2023, 12:30 PM
Hi all, We've encountered an issue where our deep storage had a temporary issue and couldn't provide one of our historicals with a segment. The historical logged a "Failed to load segment for dataSource" error and went about as normal. Unfortunately, druid still believes that segment is loaded on that historical (as shown by "select * from sys.segments where segment_id = '<segment_id>'". When running queries that hit this segment, we're now seeing "org.apache.druid.segment.SegmentMissingException" errors. How do we get out of this situation? Restart all historicals should force the to read all segments from disk and update the coordinator right? Does the coordinator ever do a periodic check of historicals to ensure they're hosting all the segments it believes they are?

John Kowtko

05/25/2023, 3:11 PM
You could try changing the replica count for just that one time chunk to see if it cleans it up ...

Steve Watkins

05/25/2023, 9:50 PM
It does seem strange that if the historical server encounters an issue when trying to publish the segment then the coordinator is not informed of this… perhaps the coordinator should not mark the segment as ‘available’ until the historical confirms a successful load of the segment?

Satish N

05/28/2023, 4:10 PM
Litttle work to find out where the datasource is distributed , but if u can remove it from segment-cache then it will be forced to load form deep storage
👍 1