Eduardo Cusa
04/11/2022, 12:59 PMCaught exception in state transition from OFFLINE -> ONLINE for resource: adv1_OFFLINE, partition: adv1_OFFLINE_2022-03-01_2022-03-01_0
.
This could be related to the data itself? or something like OOMs/resources like Mayank mentioned in the thread ? Any suggestion on how to debug it?
ThanksNeha Pawar
Neha Pawar
Eduardo Cusa
04/11/2022, 5:58 PMorg.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegmentFromDeepStore(BaseTableDataManager.java:393)
Mark Needham
Neha Pawar
Eduardo Cusa
04/11/2022, 6:03 PM<http://localhost:9000/debug/tables/adv1?type=OFFLINE&verbosity=0>
Neha Pawar
Luis Fernandez
04/11/2022, 6:42 PMEduardo Cusa
04/11/2022, 7:30 PMNeha Pawar
Neha Pawar
Eduardo Cusa
04/11/2022, 8:54 PMroot@pinot-server-0:/var/pinot/server/config# cat pinot-server.conf
pinot.server.netty.port=8098
pinot.server.adminapi.port=8097
pinot.server.instance.dataDir=/var/pinot/server/data/index
pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment
Controller:
root@pinot-controller-0:/opt/pinot# cat /var/pinot/controller/config/pinot-controller.conf
controller.helix.cluster.name=pinot-test
controller.port=9000
controller.vip.host=pinot-controller
controller.vip.port=9000
controller.data.dir=/var/pinot/controller/data
controller.zk.str=pinot-zookeeper:2181
Neha Pawar
Eduardo Cusa
04/12/2022, 12:09 PM$ curl -X POST "<http://localhost:9001/segments/adv1_OFFLINE/reset>" -H "accept: application/json"
{
"status": "Successfully reset all segments of table: adv1_OFFLINE"
}
then in server logs I found similar stack trace:
2022/04/12 12:07:03.473 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Exception while executing a state transition task adv1_OFFLINE_2022-03-01_2022-03-01_1 │
│ java.lang.reflect.InvocationTargetException: null │
│ at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] │
│ at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] │
│ at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] │
│ at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] │
│ at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd │
│ at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63 │
│ at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] │
│ at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] │
│ at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] │
│ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] │
│ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] │
│ at java.lang.Thread.run(Thread.java:829) [?:?] │
│ Caused by: org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 3 attempts │
│ at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:61) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] │
│ at org.apache.pinot.common.utils.fetcher.BaseSegmentFetcher.fetchSegmentToLocal(BaseSegmentFetcher.java:72) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a65 │
│ at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocalInternal(SegmentFetcherFactory.java:148) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9 │
│ at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:142) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93 │
│ at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocalInternal(SegmentFetcherFactory.java:164) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfe │
│ at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocal(SegmentFetcherFactory.java:158) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88a │
│ at org.apache.pinot.core.data.manager.BaseTableDataManager.downloadAndDecrypt(BaseTableDataManager.java:406) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a6 │
│ at org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegmentFromDeepStore(BaseTableDataManager.java:393) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f6 │
│ at org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegment(BaseTableDataManager.java:385) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a6506 │
│ at org.apache.pinot.core.data.manager.BaseTableDataManager.addOrReplaceSegment(BaseTableDataManager.java:372) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a │
│ at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addOrReplaceSegment(HelixInstanceDataManager.java:355) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9 │
│ at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:162) ~[pinot-all │
│ ... 12 more │
│
Neha Pawar
Eduardo Cusa
04/12/2022, 7:45 PMNeha Pawar
Eduardo Cusa
04/13/2022, 7:14 PM2022/04/13 18:40:17.304 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] Caught exception in state transition from OFFLINE -> ONL │
│ java.io.IOException: No space left on device
│ at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
│ at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) ~[?:?]
So, after increasing the server POD disk size, I was able to ingest 167 segments. Some of them were bad, but after removing them, the table was ready for queries.
Thanks for the help!Neha Pawar
Eduardo Cusa
04/13/2022, 7:21 PM