Zsolt Takacs
07/12/2021, 11:18 AM"completionMode": "DOWNLOAD"
. The download failed three times on the replica server, and the segment is stuck in error state. If I try to reload it the log says it's reloading a consuming segment, but nothing changes. How can I recover from this state?2021/07/12 03:11:23.622 INFO [AppInfoParser] [viewSegments__6__322__20210712T0211Z] Kafka version : 2.0.0
2021/07/12 03:11:23.622 INFO [AppInfoParser] [viewSegments__6__322__20210712T0211Z] Kafka commitId : 3402a8361b734732
2021/07/12 03:11:23.622 INFO [LLRealtimeSegmentDataManager_viewSegments__6__322__20210712T0211Z] [viewSegments__6__322__20210712T0211Z] Stopping consumption due to time limit start=1626055882438 now=1626059483622 numRowsConsumed=3 numRowsIndexed=3
2021/07/12 03:11:23.628 INFO [FileUploadDownloadClient] [viewSegments__6__322__20210712T0211Z] Sending request: <http://merry-macaw-pinot-controller-0.merry-macaw-pinot-controller-headless.crt.svc.cluster.local:9000/segmentConsumed?name=viewSegments__6__322__20210712T0211Z&offset=58088148&instance=Server_merry-macaw-pinot-server-1.merry-macaw-pinot-server-headless.crt.svc.cluster.local_8098&reason=timeLimit&rowCount=3&streamPartitionMsgOffset=58088148> to controller: merry-macaw-pinot-controller-0.merry-macaw-pinot-controller-headless.crt.svc.cluster.local, version: Unknown
2021/07/12 03:11:23.628 INFO [ServerSegmentCompletionProtocolHandler] [viewSegments__6__322__20210712T0211Z] Controller response {"isSplitCommitType":false,"streamPartitionMsgOffset":"58088148","buildTimeSec":-1,"status":"HOLD","offset":58088148} for <http://merry-macaw-pinot-controller-0.merry-macaw-pinot-controller-headless.crt.svc.cluster.local:9000/segmentConsumed?name=viewSegments__6__322__20210712T0211Z&offset=58088148&instance=Server_merry-macaw-pinot-server-1.merry-macaw-pinot-server-headless.crt.svc.cluster.local_8098&reason=timeLimit&rowCount=3&streamPartitionMsgOffset=58088148>
2021/07/12 03:11:25.074 INFO [HelixTaskExecutor] [ZkClient-EventThread-21-merry-macaw-zookeeper:2181] Scheduling message f0e75749-a1c3-4d9a-a65f-ce5c82ce9dc5: viewSegments_REALTIME:viewSegments__6__322__20210712T0211Z, CONSUMING->ONLINE
2021/07/12 03:11:25.074 INFO [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] handling message: f0e75749-a1c3-4d9a-a65f-ce5c82ce9dc5 transit viewSegments_REALTIME.viewSegments__6__322__20210712T0211Z|[] from:CONSUMING to:ONLINE, relayedFrom: null
2021/07/12 03:11:25.106 INFO [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Instance Server_merry-macaw-pinot-server-1.merry-macaw-pinot-server-headless.crt.svc.cluster.local_8098, partition viewSegments__6__322__20210712T0211Z received state transition from CONSUMING to ONLINE on session 100e6aa105f006d, message id: f0e75749-a1c3-4d9a-a65f-ce5c82ce9dc5
2021/07/12 03:11:25.110 INFO [LLRealtimeSegmentDataManager_viewSegments__6__322__20210712T0211Z] [HelixTaskExecutor-message_handle_thread] State: HOLDING, transitioning from CONSUMING to ONLINE (startOffset: 58088145, endOffset: 58088148)
2021/07/12 03:11:26.629 INFO [LLRealtimeSegmentDataManager_viewSegments__6__322__20210712T0211Z] [HelixTaskExecutor-message_handle_thread] Consumer thread stopped in state HOLDING
2021/07/12 03:11:26.629 INFO [LLRealtimeSegmentDataManager_viewSegments__6__322__20210712T0211Z] [HelixTaskExecutor-message_handle_thread] State HOLDING. CompletionMode DOWNLOAD. Downloading to replace
2021/07/12 03:11:26.630 INFO [S3PinotFS] [HelixTaskExecutor-message_handle_thread] Copy <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> to local /var/pinot/server/data/index/viewSegments_REALTIME/viewSegments__6__322__20210712T0211Z.tar.gz
2021/07/12 03:11:56.637 WARN [PinotFSSegmentFetcher] [HelixTaskExecutor-message_handle_thread] Caught exception while fetching segment from: <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> to: /var/pinot/server/data/index/viewSegments_REALTIME/viewSegments__6__322__20210712T0211Z.tar.gz
2021/07/12 03:11:57.056 INFO [S3PinotFS] [HelixTaskExecutor-message_handle_thread] Copy <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> to local /var/pinot/server/data/index/viewSegments_REALTIME/viewSegments__6__322__20210712T0211Z.tar.gz
2021/07/12 03:12:27.088 WARN [PinotFSSegmentFetcher] [HelixTaskExecutor-message_handle_thread] Caught exception while fetching segment from: <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> to: /var/pinot/server/data/index/viewSegments_REALTIME/viewSegments__6__322__20210712T0211Z.tar.gz
2021/07/12 03:12:28.192 INFO [S3PinotFS] [HelixTaskExecutor-message_handle_thread] Copy <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> to local /var/pinot/server/data/index/viewSegments_REALTIME/viewSegments__6__322__20210712T0211Z.tar.gz
2021/07/12 03:12:58.196 WARN [PinotFSSegmentFetcher] [HelixTaskExecutor-message_handle_thread] Caught exception while fetching segment from: <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> to: /var/pinot/server/data/index/viewSegments_REALTIME/viewSegments__6__322__20210712T0211Z.tar.gz
2021/07/12 03:13:01.512 WARN [viewSegments_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread] Failed to download segment viewSegments__6__322__20210712T0211Z from deep store:
2021/07/12 03:13:01.513 WARN [viewSegments_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread] Download segment viewSegments__6__322__20210712T0211Z from deepstore uri <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> failed.
2021/07/12 03:13:01.513 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Exception while executing a state transition task viewSegments__6__322__20210712T0211Z
2021/07/12 03:13:01.593 INFO [HelixTask] [HelixTaskExecutor-message_handle_thread] Message: f0e75749-a1c3-4d9a-a65f-ce5c82ce9dc5 (parent: null) handling task for viewSegments_REALTIME:viewSegments__6__322__20210712T0211Z completed at: 1626059581593, results: false. FrameworkTime: 23 ms; HandlerTime: 96496 ms.
2021/07/12 07:52:48.453 INFO [HelixTaskExecutor] [ZkClient-EventThread-21-merry-macaw-zookeeper:2181] Scheduling message ba69fae1-316e-4923-b2b0-3ad4701a60eb: viewSegments_REALTIME:viewSegments__6__322__20210712T0211Z, null->null
2021/07/12 07:52:48.454 INFO [viewSegments_REALTIME-SegmentReloadMessageHandler] [HelixTaskExecutor-message_handle_thread] Handling message: ZnRecord=ba69fae1-316e-4923-b2b0-3ad4701a60eb, {CREATE_TIMESTAMP=1626076368284, EXECUTE_START_TIMESTAMP=1626076368454, MSG_ID=ba69fae1-316e-4923-b2b0-3ad4701a60eb, MSG_STATE=new, MSG_SUBTYPE=RELOAD_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=viewSegments__6__322__20210712T0211Z, RESOURCE_NAME=viewSegments_REALTIME, RETRY_COUNT=0, SRC_CLUSTER=crt, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_merry-macaw-pinot-controller-1.merry-macaw-pinot-controller-headless.crt.svc.cluster.local_9000, TGT_NAME=Server_merry-macaw-pinot-server-1.merry-macaw-pinot-server-headless.crt.svc.cluster.local_8098, TGT_SESSION_ID=100e6aa105f006d, TIMEOUT=-1}{}{}, Stat=Stat {_version=0, _creationTime=1626076368404, _modifiedTime=1626076368404, _ephemeralOwner=0}
2021/07/12 07:52:48.454 INFO [viewSegments_REALTIME-SegmentReloadMessageHandler] [HelixTaskExecutor-message_handle_thread] Waiting for lock to refresh : viewSegments__6__322__20210712T0211Z, queue-length: 0
2021/07/12 07:52:48.455 INFO [viewSegments_REALTIME-SegmentReloadMessageHandler] [HelixTaskExecutor-message_handle_thread] Acquired lock to refresh segment: viewSegments__6__322__20210712T0211Z (lock-time=1ms, queue-length=0)
2021/07/12 07:52:48.455 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Reloading single segment: viewSegments__6__322__20210712T0211Z in table: viewSegments_REALTIME
2021/07/12 07:52:48.463 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Reloading segment: viewSegments__6__322__20210712T0211Z in table: viewSegments_REALTIME
2021/07/12 07:52:48.464 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Try reloading REALTIME consuming segment: viewSegments__6__322__20210712T0211Z in table: viewSegments_REALTIME
2021/07/12 07:52:48.464 INFO [MutableSegmentImpl_viewSegments__6__322__20210712T0211Z_crt-pinot-segments] [HelixTaskExecutor-message_handle_thread] Newly added columns: {}
2021/07/12 07:52:48.464 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Reloaded single segment: viewSegments__6__322__20210712T0211Z in table: viewSegments_REALTIME
2021/07/12 07:52:48.472 INFO [HelixTask] [HelixTaskExecutor-message_handle_thread] Message: ba69fae1-316e-4923-b2b0-3ad4701a60eb (parent: null) handling task for viewSegments_REALTIME:viewSegments__6__322__20210712T0211Z completed at: 1626076368472, results: true. FrameworkTime: 8 ms; HandlerTime: 10 ms.
Bruce Ritchie
07/12/2021, 3:32 PMDownload segment viewSegments__6__322__20210712T0211Z from deepstore uri <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments__6__322__20210712T0211Z> failed.
Does that segment exist in S3?Zsolt Takacs
07/12/2021, 3:36 PMBruce Ritchie
07/12/2021, 3:39 PMZsolt Takacs
07/12/2021, 3:43 PM2021/07/08 03:11:04.064 WARN [PinotFSSegmentFetcher] [HelixTaskExecutor-message_handle_thread] Caught exception while fetching segment from: <s3://data-crt-pinot-us-prod/data/viewSegments/viewSegments_OFFLINE_1564164000000_1564167600000_34> to: /var/pinot/server/data/segment/viewSegments_OFFLINE/tmp-viewSegments_OFFLINE_1564164000000_1564167600000_34-36849a99-61b3-4643-bf12-afa5908c35cc/viewSegments_OFFLINE_1564164000000_1564167600000_34.enc
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Read timed out
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:199) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:66) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:34) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:189) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:121) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:147) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:52) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:62) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at software.amazon.awssdk.services.s3.DefaultS3Client.getObject(DefaultS3Client.java:3539) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
at org.apache.pinot.plugin.filesystem.S3PinotFS.copyToLocalFile(S3PinotFS.java:490) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
Attempts exceeded when downloading segment
and the segment is stuckJackie
07/12/2021, 4:22 PMZsolt Takacs
07/12/2021, 6:22 PM