We are facing few issues below and would need some...
# troubleshooting
s
We are facing few issues below and would need some advise: 1. I am seeing this “error”: “Failed to find segment: offlinebookingnarrow_poc_OFFLINE_7016 in table: offlinebookingnarrow_poc_OFFLINE” when trying to load offline data into pinot for few segments. Any suggestions on how to fix or what might be causing it. 2. When we are loading large data into pinot, i am seeing that the query performance for existing tables degrades badly during this load process. Any suggestions on how we can ensure query performance remains same irrespective of what other processes run in background on pinot side. 3. I am using a delete segment API to delete all segments for a table. curl -X DELETE “http://controllerurl/segments/offlinebookingnarrow_poc?type=OFFLINE&retention=0d” -H “accept: application/json” . We are seeing that although it deletes the segment metadata from table, the underlying data is not deleted from disc causing the disc to become full. @Priyank Bagrecha please add any more details based on your observation. Is there a way to ensure underlying data also gets deleted.
m
1. Check table debug api to see if surfaces any exceptions. If not, then check on the server logs for the segment name. 2. How large is the data being loaded? Also are you appending or refreshing? And are the server disks local HDD/SSD, or remote EBS gp2/gp3 etc? 3. It is expected to be deleting all the segments from local disk. If not, please file a GH issue.
s
@Priyank Bagrecha can you please share any more details based on @Mayank comments.
p
I ended up cleaning the discs manually to get past the issue for now. The discs are EBS gp2. I can add more information if the issue happens again.
m
Esp for 3, if that happens again, please file a GH issue with the all the details.
p
Will do
Now one of the thing I noticed that reported size for the offline table is as in the screenshot but despite using 30 instances with 2 TB disc each, each instance has ~84% disc utllization.
Copy code
$ kubectl exec -it pinot-server-0 -n pinot -- bash
root@pinot-server-0:/opt/pinot# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          80G  7.0G   74G   9% /
tmpfs            64M     0   64M   0% /dev
tmpfs            60G     0   60G   0% /sys/fs/cgroup
/dev/xvda1       80G  7.0G   74G   9% /etc/hosts
shm              64M     0   64M   0% /dev/shm
/dev/xvdbl      1.8T  1.5T  300G  84% /var/pinot/server/data
tmpfs            60G   12K   60G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs            60G     0   60G   0% /proc/acpi
tmpfs            60G     0   60G   0% /proc/scsi
tmpfs            60G     0   60G   0% /sys/firmware
Does the reported size in the UI include replication? Does it include indexes?
replication on the table is 2. i am seeing similar counts on each of the 30 server instances
Copy code
root@pinot-server-0:/var/pinot/server/data/index# ls -lh | awk '{print $9}' | awk -F '_' '{print $1}' | uniq -c
      1
    167 bidrequest
   1739 offlinebookingnarrow
     16 offlinebookingwide
      1 tmp
      3 usersample
according to ui, there are supposed to be ~670 segments per server
we are using release-0.11.0
i deleted all segments using the delete all segments api and now i don't see any segments associated with the offline table in the controller ui. however i still see segments on the disc.
Copy code
root@pinot-server-0:/opt/pinot# ls -lh /var/pinot/server/data/index | awk '{print $9}' | awk -F '_' '{print $1}' | uniq -c
      1
    167 bidrequest
   1739 offlinebookingnarrow
     16 offlinebookingwide
      1 tmp
      3 usersample
root@pinot-server-0:/opt/pinot#
According to
Copy code
// move the segment file to deleted segments first and let retention manager handler the deletion
        String deletedFileName = deletedSegmentsRetentionMs == null ? URIUtils.encode(segmentId)
            : getDeletedSegmentFileName(URIUtils.encode(segmentId), deletedSegmentsRetentionMs);
        URI deletedSegmentMoveDestURI = URIUtils.getUri(_dataDir, DELETED_SEGMENTS, rawTableName, deletedFileName);
I don't see a
Deleted_Segments
folder on the disc, and we don't have any segments in s3 segment store because we didn't configure the batch ingestion spark job to copy the generated segments to the s3 segment store. Do you think that's why the segment files are not getting deleted as S3PinotFS instead of LocalPinotFS might be used, and it doesn't find the segment there so it doesn't delete anything.
I'll debug more next week and file a issue with my findings.
@Mayank I filed https://github.com/apache/pinot/issues/9393. I see that code is using S3PinotFS to delete from S3, but nothing to delete from local filesystem for server. Could you or anyone else please share pointers on how I can go about debugging this and hopefully contribute a fix?
m
So the deleting of segment would delete it from IdealState, which in turn will let the server know that it has to delete it. You can set a break point in the server side code to see when it closes the segment.
p
looking at server logs for deleting segment
offlinebookingnarrow_poc_OFFLINE_3
it doesn't seem like that it even tries to delete locally
Copy code
Scheduling message 47024269-249c-47f7-84ed-e4821e1f8f08: offlinebookingnarrow_poc_OFFLINE:offlinebookingnarrow_poc_OFFLINE_3, ONLINE->OFFLINE
Submit task: 47024269-249c-47f7-84ed-e4821e1f8f08 to pool: java.util.concurrent.ThreadPoolExecutor@6fa4b6cd[Running, pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 5224]
Message: 47024269-249c-47f7-84ed-e4821e1f8f08 handling task scheduled
73 END:INVOKE CallbackHandler 0, /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818 type: CALLBACK Took: 6ms
handling task: 47024269-249c-47f7-84ed-e4821e1f8f08 begin, at: 1663093111984
handling message: 47024269-249c-47f7-84ed-e4821e1f8f08 transit offlinebookingnarrow_poc_OFFLINE.offlinebookingnarrow_poc_OFFLINE_3|[] from:ONLINE to:OFFLINE, relayedFrom: null
Instance Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098, partition offlinebookingnarrow_poc_OFFLINE_3 received state transition from ONLINE to OFFLINE on session 3001f37ca930355, message id: 47024269-249c-47f7-84ed-e4821e1f8f08
SegmentOnlineOfflineStateModel.onBecomeOfflineFromOnline() : ZnRecord=47024269-249c-47f7-84ed-e4821e1f8f08, {CREATE_TIMESTAMP=1663093111968, ClusterEventName=IdealStateChange, EXECUTE_START_TIMESTAMP=1663093111985, EXE_SESSION_ID=3001f37ca930355, FROM_STATE=ONLINE, MSG_ID=47024269-249c-47f7-84ed-e4821e1f8f08, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=offlinebookingnarrow_poc_OFFLINE_3, READ_TIMESTAMP=1663093111980, RESOURCE_NAME=offlinebookingnarrow_poc_OFFLINE, RESOURCE_TAG=offlinebookingnarrow_poc_OFFLINE, RETRY_COUNT=3, SRC_NAME=pinot-controller-2.pinot-controller-headless.pinot.svc.cluster.local_9000, SRC_SESSION_ID=1001f37be1302ec, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098, TGT_SESSION_ID=3001f37ca930355, TO_STATE=OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1663093111974, _modifiedTime=1663093111974, _ephemeralOwner=0}
Removing segment: offlinebookingnarrow_poc_OFFLINE_3 from table: offlinebookingnarrow_poc_OFFLINE
Removing segment: offlinebookingnarrow_poc_OFFLINE_3 from table: offlinebookingnarrow_poc_OFFLINE
Closing segment: offlinebookingnarrow_poc_OFFLINE_3 of table: offlinebookingnarrow_poc_OFFLINE
Trying to destroy segment : offlinebookingnarrow_poc_OFFLINE_3
Closed segment: offlinebookingnarrow_poc_OFFLINE_3 of table: offlinebookingnarrow_poc_OFFLINE
Removed segment: offlinebookingnarrow_poc_OFFLINE_3 from table: offlinebookingnarrow_poc_OFFLINE
Removed segment: offlinebookingnarrow_poc_OFFLINE_3 from table: offlinebookingnarrow_poc_OFFLINE
Message 47024269-249c-47f7-84ed-e4821e1f8f08 completed.
Delete message 47024269-249c-47f7-84ed-e4821e1f8f08 from zk!
message finished: 47024269-249c-47f7-84ed-e4821e1f8f08, took 17
Message: 47024269-249c-47f7-84ed-e4821e1f8f08 (parent: null) handling task for offlinebookingnarrow_poc_OFFLINE:offlinebookingnarrow_poc_OFFLINE_3 completed at: 1663093112002, results: true. FrameworkTime: 5 ms; HandlerTime: 13 ms.
73 START: CallbackHandler 0, INVOKE /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818 type: CALLBACK
CallbackHandler 0 subscribing changes listener to path: /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES, callback type: CALLBACK, event types: [NodeChildrenChanged], listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818, watchChild: false
CallbackHandler0, Subscribing to path: /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES took: 0
No Messages to process
73 END:INVOKE CallbackHandler 0, /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818 type: CALLBACK Took: 1ms
73 START: CallbackHandler 0, INVOKE /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818 type: CALLBACK
CallbackHandler 0 subscribing changes listener to path: /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES, callback type: CALLBACK, event types: [NodeChildrenChanged], listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818, watchChild: false
CallbackHandler0, Subscribing to path: /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES took: 1
The latency of message 34ff3eaf-d363-4f9f-9aca-003df99e519f is 15 ms
Scheduling message 34ff3eaf-d363-4f9f-9aca-003df99e519f: offlinebookingnarrow_poc_OFFLINE:offlinebookingnarrow_poc_OFFLINE_3, OFFLINE->DROPPED
Submit task: 34ff3eaf-d363-4f9f-9aca-003df99e519f to pool: java.util.concurrent.ThreadPoolExecutor@6fa4b6cd[Running, pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 5225]
Message: 34ff3eaf-d363-4f9f-9aca-003df99e519f handling task scheduled
73 END:INVOKE CallbackHandler 0, /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818 type: CALLBACK Took: 7ms
handling task: 34ff3eaf-d363-4f9f-9aca-003df99e519f begin, at: 1663093112183
handling message: 34ff3eaf-d363-4f9f-9aca-003df99e519f transit offlinebookingnarrow_poc_OFFLINE.offlinebookingnarrow_poc_OFFLINE_3|[] from:OFFLINE to:DROPPED, relayedFrom: null
Instance Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098, partition offlinebookingnarrow_poc_OFFLINE_3 received state transition from OFFLINE to DROPPED on session 3001f37ca930355, message id: 34ff3eaf-d363-4f9f-9aca-003df99e519f
SegmentOnlineOfflineStateModel.onBecomeDroppedFromOffline() : ZnRecord=34ff3eaf-d363-4f9f-9aca-003df99e519f, {CREATE_TIMESTAMP=1663093112164, ClusterEventName=MessageChange, EXECUTE_START_TIMESTAMP=1663093112183, EXE_SESSION_ID=3001f37ca930355, FROM_STATE=OFFLINE, MSG_ID=34ff3eaf-d363-4f9f-9aca-003df99e519f, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=offlinebookingnarrow_poc_OFFLINE_3, READ_TIMESTAMP=1663093112179, RESOURCE_NAME=offlinebookingnarrow_poc_OFFLINE, RESOURCE_TAG=offlinebookingnarrow_poc_OFFLINE, RETRY_COUNT=3, SRC_NAME=pinot-controller-2.pinot-controller-headless.pinot.svc.cluster.local_9000, SRC_SESSION_ID=1001f37be1302ec, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098, TGT_SESSION_ID=3001f37ca930355, TO_STATE=DROPPED}{}{}, Stat=Stat {_version=0, _creationTime=1663093112171, _modifiedTime=1663093112171, _ephemeralOwner=0}
Merging with delta list, recordId = offlinebookingnarrow_poc_OFFLINE other:offlinebookingnarrow_poc_OFFLINE
Message 34ff3eaf-d363-4f9f-9aca-003df99e519f completed.
Delete message 34ff3eaf-d363-4f9f-9aca-003df99e519f from zk!
message finished: 34ff3eaf-d363-4f9f-9aca-003df99e519f, took 15
Message: 34ff3eaf-d363-4f9f-9aca-003df99e519f (parent: null) handling task for offlinebookingnarrow_poc_OFFLINE:offlinebookingnarrow_poc_OFFLINE_3 completed at: 1663093112198, results: true. FrameworkTime: 4 ms; HandlerTime: 11 ms.
73 START: CallbackHandler 0, INVOKE /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818 type: CALLBACK
CallbackHandler 0 subscribing changes listener to path: /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES, callback type: CALLBACK, event types: [NodeChildrenChanged], listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2b194818, watchChild: false
CallbackHandler0, Subscribing to path: /pinot-dev/INSTANCES/Server_pinot-server-4.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES took: 0
No Messages to process
m
For fixing it, I am suggesting to find the place where it closes the segment and see what can be done there
👍 1
p
Is this the right class to run in debug mode? asking because https://docs.pinot.apache.org/developers/developers-and-contributors/code-setup#starting-pinot-via-ide points to running via cmd line and not via intellij / ide.
m
You can run any of the quick start classes in IDE
👍 1