Hey team :wave: I'm running into an issue with off...
# troubleshooting
s
Hey team 👋 I'm running into an issue with offline segments in
error
state. When checking
debug
endpoint and server logs, am seeing
java.nio.file.NoSuchFileException
on specific segment files local paths. I've tried rebalancing servers w/
downtime
+
bootstrap
toggled as well as running reload segment with
forceDownload
set to
true
but still can't seem to clear the `error`s. any tips on how to repair this error state?
✅ 1
I've verified that the associated segment
segment.download.url
is valid and contains data.
m
Did you try reload api?
s
Yes I did, I tried
/segments/{tableName}/{segmentName}/reload
endpoint with specific segment and
forceDownload
set to
true
. The response indicated success (
Sent reload messages
)) but the server still have the same
NoSuchFileException
afterwards.
Just fired off a request to reload all segments via
/segments/{tableName}/reload
w/
forceDownload
set to
true
.
m
Does the table debug endpoint show any issues? If not, check for server log for the segment name, and also disk space.
s
I'm seeing the same error message in the debug endpoint:
Copy code
"errorMessage": "Caught exception in state transition from OFFLINE -> ONLINE for resource: <table_name>, partition: <table_name>_2021-07-18_2021-07-18_0",
              "stackTrace": "java.nio.file.NoSuchFileException: ...
n
is it possible that your segment tar got deleted from the segment store? was the segment store on controller disk, or have you set up S3?
👀 1
s
segment store is in s3, i did validate the segment data still exists from the the segment.download.url. also seeing some segments replicated 1/2 successfully though other segments have 0/2 replicants available (replication factor = 2).
This ended up being a directory permissions issue where download from segment store was failing and causing the
NoSuchFileException
.
m
Hmm, I think we recently fixed the error message.
s
After resolving this error and attempting to reload all segments, we are hitting disk full error on certain servers due to lot of data stored at
server/index/
path. Some of this index data exists for tables that no longer exist in our pinot cluster or segments that were reassigned to other
servers
in the segment routing table. Is there a good way to keep this directory purged of unused data?