Hey team wave I m running into an issue with offline segment Apache Pinot #troubleshooting

Hey team :wave: I'm running into an issue with off...

Scott deRegt

08/17/2022, 10:52 PM

Hey team 👋 I'm running into an issue with offline segments in

error

state. When checking

debug

endpoint and server logs, am seeing

java.nio.file.NoSuchFileException

on specific segment files local paths. I've tried rebalancing servers w/

downtime

bootstrap

toggled as well as running reload segment with

forceDownload

set to

true

but still can't seem to clear the `error`s. any tips on how to repair this error state?

✅ 1

Scott deRegt

08/17/2022, 11:00 PM

I've verified that the associated segment

segment.download.url

is valid and contains data.

Mayank

08/17/2022, 11:01 PM

Did you try reload api?

Scott deRegt

08/17/2022, 11:08 PM

Yes I did, I tried

/segments/{tableName}/{segmentName}/reload

endpoint with specific segment and

forceDownload

set to

true

. The response indicated success (

Sent reload messages

)) but the server still have the same

NoSuchFileException

afterwards.

Scott deRegt

08/17/2022, 11:09 PM

Just fired off a request to reload all segments via

/segments/{tableName}/reload

forceDownload

set to

true

Mayank

08/18/2022, 12:12 AM

Does the table debug endpoint show any issues? If not, check for server log for the segment name, and also disk space.

Scott deRegt

08/18/2022, 3:33 PM

I'm seeing the same error message in the debug endpoint:

Copy code

"errorMessage": "Caught exception in state transition from OFFLINE -> ONLINE for resource: <table_name>, partition: <table_name>_2021-07-18_2021-07-18_0",
              "stackTrace": "java.nio.file.NoSuchFileException: ...

Neha Pawar

08/18/2022, 5:58 PM

is it possible that your segment tar got deleted from the segment store? was the segment store on controller disk, or have you set up S3?

👀 1

Scott deRegt

08/19/2022, 4:45 PM

segment store is in s3, i did validate the segment data still exists from the the segment.download.url. also seeing some segments replicated 1/2 successfully though other segments have 0/2 replicants available (replication factor = 2).

Scott deRegt

08/19/2022, 9:54 PM

This ended up being a directory permissions issue where download from segment store was failing and causing the

NoSuchFileException

Mayank

08/19/2022, 9:55 PM

Hmm, I think we recently fixed the error message.

Scott deRegt

08/19/2022, 9:59 PM

After resolving this error and attempting to reload all segments, we are hitting disk full error on certain servers due to lot of data stored at

server/index/

path. Some of this index data exists for tables that no longer exist in our pinot cluster or segments that were reassigned to other

servers

in the segment routing table. Is there a good way to keep this directory purged of unused data?

Open in Slack

Previous Next