Gerrit van Doorn
10/24/2022, 10:22 PMMayank
Gerrit van Doorn
10/24/2022, 11:59 PMSending request: <http://foo.bar.company.com:9000/v2/segments?tableName=events> to controller: <http://foo.bar.company.com|foo.bar.company.com>, version: Unknown
Response for pushing table events segment events_OFFLINE_1606134538000000000_1665361807000000000_1 to location <http://foo.bar.company.com:9000> - 200: {"status":"Successfully uploaded segment: events_OFFLINE_1606134538000000000_1665361807000000000_1 of table: events_OFFLINE"}Gerrit van Doorn
10/25/2022, 12:00 AMMayank
Gerrit van Doorn
10/25/2022, 12:00 AMHaitao Zhang
10/25/2022, 1:08 AMHaitao Zhang
10/25/2022, 1:09 AMGerrit van Doorn
10/25/2022, 2:19 AMincludeFileNamePattern: 'glob:**/partition-000*.recordio'-Dlog4j2.configurationFile=conf/log4j2.xmlMayank
Gerrit van Doorn
10/25/2022, 2:20 AMGerrit van Doorn
10/25/2022, 2:59 AMHaitao Zhang
10/25/2022, 3:00 AMGerrit van Doorn
10/25/2022, 3:01 AMHaitao Zhang
10/25/2022, 3:15 AMHaitao Zhang
10/25/2022, 3:16 AMis it possible that you share the ingestionJobSpec?I can try your spec if you can share
Gerrit van Doorn
10/25/2022, 3:18 AMexecutionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
# Recommended to set jobType to SegmentCreationAndMetadataPush for production environment where Pinot Deep Store is configured
#jobType: SegmentCreation
#jobType: SegmentTarPush
#jobType: SegmentUriPush
# jobType: SegmentMetadataPush
#jobType: SegmentCreationAndTarPush
#jobType: SegmentCreationAndUriPush
jobType: SegmentCreationAndMetadataPush
# NOTE: use "-" instead of "_"
inputDirURI: '<os://FEED-STAGE/day=2022-10-10/hour=00>'
includeFileNamePattern: 'glob:**/partition-000*.recordio'
outputDirURI: '<s3://pinot-deep-store/segments/offline>'
overwriteOutput: true
pinotFSSpecs:
  - scheme: os
    className: com.company.pinot.plugin.filesystem.object_store.ObjectStorePinotFS
  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:
      region: us-west-2
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'recordio'
  className: 'com.company.pinot.plugin.inputformat.recordio.RecordIORecordReader'
tableSpec:
  tableName: 'events'
pinotClusterSpecs:
  - controllerURI: '<http://pinot-controller.company.com:9000>'
pushJobSpec:
  pushAttempts: 2
  pushRetryIntervalMillis: 1000Gerrit van Doorn
10/25/2022, 3:25 AMSegmentCreationAndMetadataPushHaitao Zhang
10/25/2022, 3:31 AMbtw does theRight, that's something we can improvefirst upload the data to deepstore, then download it again to fetch metadata? That seems inefficient considering it just was processing that same dataSegmentCreationAndMetadataPush
Haitao Zhang
10/25/2022, 3:34 AMStart pushing segment metadata:Haitao Zhang
10/25/2022, 3:36 AMsegmentMetadataFileHaitao Zhang
10/25/2022, 3:38 AMGerrit van Doorn
10/25/2022, 3:40 AMHaitao Zhang
10/25/2022, 3:41 AMGerrit van Doorn
10/25/2022, 10:09 PM2022-10-25 21:35:57 DEBUG amazon.awssdk.utils.Logger:84 - Received successful response: 200
2022-10-25 21:35:57 DEBUG amazon.awssdk.utils.Logger:84 - Interceptor 'software.amazon.awssdk.services.s3.internal.handlers.EnableTrailingChecksumInterceptor@33e0c716' modified the message with its modifyResponse method.
2022-10-25 21:35:59 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:344 - Connection [id: 5][route: {tls}-><http://proxy-batch.foo.company.com:3128>-><https://project-pinot-deep-store.s3.us-west-2.amazonaws.com:443>] can be kept alive for 60.0 seconds
2022-10-25 21:35:59 DEBUG apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-5: set socket timeout to 0
2022-10-25 21:35:59 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:351 - Connection released: [id: 5][route: {tls}-><http://proxy-batch.foo.company.com:3128>-><https://project-pinot-deep-store.s3.us-west-2.amazonaws.com:443>][total available: 1; route allocated: 1 of 50; total allocated: 1 of 50]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:352 - Trying to untar Metadata file from: [/tmp/segmentTar-8535c540-b8a9-495a-96eb-64852b3646b7.tar.gz] to [/tmp/segmentMetadataDir-8535c540-b8a9-495a-96eb-64852b3646b7]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:357 - Trying to untar CreationMeta file from: [/tmp/segmentTar-8535c540-b8a9-495a-96eb-64852b3646b7.tar.gz] to [/tmp/segmentMetadataDir-8535c540-b8a9-495a-96eb-64852b3646b7]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:366 - Trying to tar segment metadata dir [/tmp/segmentMetadataDir-8535c540-b8a9-495a-96eb-64852b3646b7] to [/tmp/segmentMetadata-8535c540-b8a9-495a-96eb-64852b3646b7.tar.gz]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:252 - Pushing segment: events_OFFLINE_1606134538000000000_1665361807000000000_1 to location: <http://foo8d-rg1-36a.foo.company.com:9000> for table events
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.client.protocol.RequestAddCookies:123 - CookieSpec selected: default
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.client.protocol.RequestAuthCache:77 - Auth cache not set in the context
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:267 - Connection request: [route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>][total available: 1; route allocated: 1 of 2; total allocated: 1 of 20]
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:312 - Connection leased: [id: 0][route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>][total available: 0; route allocated: 1 of 2; total allocated: 1 of 20]
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-0: set socket timeout to 0
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-0: set socket timeout to 600000
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:255 - Executing request POST /v2/segments?tableName=events HTTP/1.1
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:260 - Target auth state: UNCHALLENGED
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:266 - Proxy auth state: UNCHALLENGED
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:133 - http-outgoing-0 >> POST /v2/segments?tableName=events HTTP/1.1
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> DOWNLOAD_URI: <s3://project-pinot-deep-store/segments/offline/events_OFFLINE_1606134538000000000_1665361807000000000_1.tar.gz>
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> UPLOAD_TYPE: METADATA
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> COPY_SEGMENT_TO_DEEP_STORE: false
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Content-Length: 8785
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Content-Type: multipart/form-data; boundary=IBKhI--_wIiCbIqMX5Vonk1I9tdXVGFgYcrP3
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Host: <http://foo8d-rg1-36a.foo.company.com:9000|foo8d-rg1-36a.foo.company.com:9000>
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Connection: Keep-Alive
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> User-Agent: Apache-HttpClient/4.5.13 (Java/1.8.0_342)
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Accept-Encoding: gzip,deflate
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:122 - http-outgoing-0 << HTTP/1.1 200 OK
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Pinot-Controller-Host: <http://foo8d-rg1-36a.foo.company.com|foo8d-rg1-36a.foo.company.com>
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Pinot-Controller-Version: Unknown
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Access-Control-Allow-Origin: *
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Content-Type: application/json
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Content-Length: 125
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:285 - Connection can be kept alive indefinitely
2022-10-25 21:35:59 INFO  apache.pinot.common.utils.http.HttpClient:280 - Sending request: <http://foo8d-rg1-36a.foo.company.com:9000/v2/segments?tableName=events> to controller: <http://foo8d-rg1-36a.foo.company.com|foo8d-rg1-36a.foo.company.com>, version: Unknown
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:344 - Connection [id: 0][route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>] can be kept alive indefinitely
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-0: set socket timeout to 0
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:351 - Connection released: [id: 0][route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>][total available: 1; route allocated: 1 of 2; total allocated: 1 of 20]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:277 - Response for pushing table events segment events_OFFLINE_1606134538000000000_1665361807000000000_1 to location <http://foo8d-rg1-36a.foo.company.com:9000> - 200: {"status":"Successfully uploaded segment: events_OFFLINE_1606134538000000000_1665361807000000000_1 of table: events_OFFLINE"}
2022-10-25 21:36:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:36:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.LoggingManagedHttpClientConnection:79 - http-outgoing-4: Close connection
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.LoggingManagedHttpClientConnection:79 - http-outgoing-5: Close connection
2022-10-25 21:38:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:38:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:39:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:39:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:40:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:40:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:41:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:41:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:42:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:42:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:43:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
etc.Haitao Zhang
10/25/2022, 10:13 PMGerrit van Doorn
10/25/2022, 10:15 PM/tmp/pinot-<some-sha>Gerrit van Doorn
10/25/2022, 10:52 PM$ netstat -natp | grep 31452
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp6       0      0 10.60.96.26:52788       10.112.145.226:5100     ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:38794       10.115.108.35:5100      ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:19296       10.113.174.10:5100      ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:18836       10.112.7.194:5100       ESTABLISHED 31452/java          
tcp6       1      0 10.60.96.26:25764       10.60.96.26:9000        CLOSE_WAIT  31452/java          
tcp6       0      0 127.0.0.1:41196         127.0.0.1:4987          ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:36692       10.113.37.2:5100        ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:24010       10.113.119.34:5100      ESTABLISHED 31452/javaGerrit van Doorn
10/25/2022, 10:58 PMss -tap state CLOSE-WAITCLOSE-WAITGerrit van Doorn
10/25/2022, 11:01 PMHaitao Zhang
10/25/2022, 11:37 PMHaitao Zhang
10/25/2022, 11:39 PMThe temp data is written tojust minor thing (may not be related), the folder isand is removed/tmp/pinot-<some-sha>
/<prefix>/segmentMetadata-<some sha>Gerrit van Doorn
10/28/2022, 7:10 PMNettyChannelBuilder b = ...
b.build()Haitao Zhang
10/28/2022, 8:18 PM