Hi folks, I’m running a standalone batch import jo...
# getting-started
g
Hi folks, I’m running a standalone batch import job. What is the reason for this job never exiting? It imports my files and then just sits there. Is there a way to have it exit?
m
It should exit when all files are imported. Anything in the logs?
g
no errors. It will write the 8 segments to deep store and then I see it
Copy code
Sending request: <http://foo.bar.company.com:9000/v2/segments?tableName=events> to controller: <http://foo.bar.company.com|foo.bar.company.com>, version: Unknown
Response for pushing table events segment events_OFFLINE_1606134538000000000_1665361807000000000_1 to location <http://foo.bar.company.com:9000> - 200: {"status":"Successfully uploaded segment: events_OFFLINE_1606134538000000000_1665361807000000000_1 of table: events_OFFLINE"}
8 times (1 for each file) Pressing enter doesn’t do anything, I have to use CTRL+C to have it exit
and it doesn’t exit
m
@Seunghyun @Haitao Zhang
g
note that this is 0.10.0
h
1. is this release 0.10.0? 2. how many input files are there in total? is it working on other segments? 3. did you enable debug level log and see anything strange?
is it possible that you share the ingestionJobSpec?
g
1. yes 2. this is just 8 files. The directory contains around 8*32 files but I filter using
includeFileNamePattern: 'glob:**/partition-000*.recordio'
3. I did not enable debug. What is the best way to capture all? I’m currently using
-Dlog4j2.configurationFile=conf/log4j2.xml
with the defaults
m
Also, is it easy for you to try 0.11?
g
I can give that a try
just tried with 0.11.0. Same thing. Note that I’m running jdk8, but I doubt that matters. It works for 99.9% except for exiting
h
Is the Pinot cluster also 0.10.0 or 0.11.0?
g
That is running 0.10.0
h
you can modify log4j2.xml to show debug logs
is it possible that you share the ingestionJobSpec?
I can try your spec if you can share
g
Copy code
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'

# Recommended to set jobType to SegmentCreationAndMetadataPush for production environment where Pinot Deep Store is configured
#jobType: SegmentCreation
#jobType: SegmentTarPush
#jobType: SegmentUriPush
# jobType: SegmentMetadataPush
#jobType: SegmentCreationAndTarPush
#jobType: SegmentCreationAndUriPush
jobType: SegmentCreationAndMetadataPush

# NOTE: use "-" instead of "_"
inputDirURI: '<os://FEED-STAGE/day=2022-10-10/hour=00>'
includeFileNamePattern: 'glob:**/partition-000*.recordio'
outputDirURI: '<s3://pinot-deep-store/segments/offline>'
overwriteOutput: true
pinotFSSpecs:
  - scheme: os
    className: com.company.pinot.plugin.filesystem.object_store.ObjectStorePinotFS
  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:
      region: us-west-2
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'recordio'
  className: 'com.company.pinot.plugin.inputformat.recordio.RecordIORecordReader'
tableSpec:
  tableName: 'events'
pinotClusterSpecs:
  - controllerURI: '<http://pinot-controller.company.com:9000>'
pushJobSpec:
  pushAttempts: 2
  pushRetryIntervalMillis: 1000
btw does the
SegmentCreationAndMetadataPush
first upload the data to deepstore, then download it again to fetch metadata? That seems inefficient considering it just was processing that same data
h
btw does the
SegmentCreationAndMetadataPush
first upload the data to deepstore, then download it again to fetch metadata? That seems inefficient considering it just was processing that same data
Right, that's something we can improve
👌 1
Can you share the log starting with
Start pushing segment metadata:
?
It seems to me that the code got stuck when trying to delete the
segmentMetadataFile
Maybe Pinot code does not have permission to clean up that files?
g
hmm interesting. I’ll have a look and report back
h
or maybe there is a race condition?
g
When I set the logging to debug and remove some of the .Wire log output, the last messages in the log are:
Copy code
2022-10-25 21:35:57 DEBUG amazon.awssdk.utils.Logger:84 - Received successful response: 200
2022-10-25 21:35:57 DEBUG amazon.awssdk.utils.Logger:84 - Interceptor 'software.amazon.awssdk.services.s3.internal.handlers.EnableTrailingChecksumInterceptor@33e0c716' modified the message with its modifyResponse method.
2022-10-25 21:35:59 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:344 - Connection [id: 5][route: {tls}-><http://proxy-batch.foo.company.com:3128>-><https://project-pinot-deep-store.s3.us-west-2.amazonaws.com:443>] can be kept alive for 60.0 seconds
2022-10-25 21:35:59 DEBUG apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-5: set socket timeout to 0
2022-10-25 21:35:59 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:351 - Connection released: [id: 5][route: {tls}-><http://proxy-batch.foo.company.com:3128>-><https://project-pinot-deep-store.s3.us-west-2.amazonaws.com:443>][total available: 1; route allocated: 1 of 50; total allocated: 1 of 50]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:352 - Trying to untar Metadata file from: [/tmp/segmentTar-8535c540-b8a9-495a-96eb-64852b3646b7.tar.gz] to [/tmp/segmentMetadataDir-8535c540-b8a9-495a-96eb-64852b3646b7]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:357 - Trying to untar CreationMeta file from: [/tmp/segmentTar-8535c540-b8a9-495a-96eb-64852b3646b7.tar.gz] to [/tmp/segmentMetadataDir-8535c540-b8a9-495a-96eb-64852b3646b7]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:366 - Trying to tar segment metadata dir [/tmp/segmentMetadataDir-8535c540-b8a9-495a-96eb-64852b3646b7] to [/tmp/segmentMetadata-8535c540-b8a9-495a-96eb-64852b3646b7.tar.gz]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:252 - Pushing segment: events_OFFLINE_1606134538000000000_1665361807000000000_1 to location: <http://foo8d-rg1-36a.foo.company.com:9000> for table events
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.client.protocol.RequestAddCookies:123 - CookieSpec selected: default
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.client.protocol.RequestAuthCache:77 - Auth cache not set in the context
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:267 - Connection request: [route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>][total available: 1; route allocated: 1 of 2; total allocated: 1 of 20]
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:312 - Connection leased: [id: 0][route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>][total available: 0; route allocated: 1 of 2; total allocated: 1 of 20]
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-0: set socket timeout to 0
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-0: set socket timeout to 600000
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:255 - Executing request POST /v2/segments?tableName=events HTTP/1.1
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:260 - Target auth state: UNCHALLENGED
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:266 - Proxy auth state: UNCHALLENGED
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:133 - http-outgoing-0 >> POST /v2/segments?tableName=events HTTP/1.1
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> DOWNLOAD_URI: <s3://project-pinot-deep-store/segments/offline/events_OFFLINE_1606134538000000000_1665361807000000000_1.tar.gz>
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> UPLOAD_TYPE: METADATA
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> COPY_SEGMENT_TO_DEEP_STORE: false
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Content-Length: 8785
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Content-Type: multipart/form-data; boundary=IBKhI--_wIiCbIqMX5Vonk1I9tdXVGFgYcrP3
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Host: <http://foo8d-rg1-36a.foo.company.com:9000|foo8d-rg1-36a.foo.company.com:9000>
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Connection: Keep-Alive
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> User-Agent: Apache-HttpClient/4.5.13 (Java/1.8.0_342)
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:136 - http-outgoing-0 >> Accept-Encoding: gzip,deflate
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:122 - http-outgoing-0 << HTTP/1.1 200 OK
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Pinot-Controller-Host: <http://foo8d-rg1-36a.foo.company.com|foo8d-rg1-36a.foo.company.com>
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Pinot-Controller-Version: Unknown
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Access-Control-Allow-Origin: *
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Content-Type: application/json
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:125 - http-outgoing-0 << Content-Length: 125
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec:285 - Connection can be kept alive indefinitely
2022-10-25 21:35:59 INFO  apache.pinot.common.utils.http.HttpClient:280 - Sending request: <http://foo8d-rg1-36a.foo.company.com:9000/v2/segments?tableName=events> to controller: <http://foo8d-rg1-36a.foo.company.com|foo8d-rg1-36a.foo.company.com>, version: Unknown
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:344 - Connection [id: 0][route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>] can be kept alive indefinitely
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.LoggingManagedHttpClientConnection:88 - http-outgoing-0: set socket timeout to 0
2022-10-25 21:35:59 DEBUG apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager:351 - Connection released: [id: 0][route: {}-><http://foo8d-rg1-36a.foo.company.com:9000>][total available: 1; route allocated: 1 of 2; total allocated: 1 of 20]
2022-10-25 21:35:59 INFO  apache.pinot.segment.local.utils.SegmentPushUtils:277 - Response for pushing table events segment events_OFFLINE_1606134538000000000_1665361807000000000_1 to location <http://foo8d-rg1-36a.foo.company.com:9000> - 200: {"status":"Successfully uploaded segment: events_OFFLINE_1606134538000000000_1665361807000000000_1 of table: events_OFFLINE"}
2022-10-25 21:36:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:36:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.LoggingManagedHttpClientConnection:79 - http-outgoing-4: Close connection
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:37:35 DEBUG apache.http.impl.conn.LoggingManagedHttpClientConnection:79 - http-outgoing-5: Close connection
2022-10-25 21:38:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:38:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:39:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:39:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:40:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:40:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:41:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:41:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:42:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:42:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
2022-10-25 21:43:35 DEBUG apache.http.impl.conn.PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS
etc.
h
That means all jobs are done. Somehow it got stuck. Did you check the folder deletion part?
g
The temp data is written to
/tmp/pinot-<some-sha>
and is removed
Copy code
$ netstat -natp | grep 31452
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp6       0      0 10.60.96.26:52788       10.112.145.226:5100     ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:38794       10.115.108.35:5100      ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:19296       10.113.174.10:5100      ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:18836       10.112.7.194:5100       ESTABLISHED 31452/java          
tcp6       1      0 10.60.96.26:25764       10.60.96.26:9000        CLOSE_WAIT  31452/java          
tcp6       0      0 127.0.0.1:41196         127.0.0.1:4987          ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:36692       10.113.37.2:5100        ESTABLISHED 31452/java          
tcp6       0      0 10.60.96.26:24010       10.113.119.34:5100      ESTABLISHED 31452/java
There is a CLOSE_WAIT socket
there was a FD open. I closed it using gdb but that didn’t result in the program exiting. When running
ss -tap state CLOSE-WAIT
I still see 9 ports in the
CLOSE-WAIT
state
actually, when exiting the program those sockets remain, so they are not associated with the job and can be ignored. So from this we can conclude that it’s probably not a “hanging” socket
👍 1
h
hmmm, yesterday, I tried on my locally pinot cluster (without your customization), and it run well
The temp data is written to
/tmp/pinot-<some-sha>
and is removed
just minor thing (may not be related), the folder is
/<prefix>/segmentMetadata-<some sha>
g
@Haitao Zhang just an update. I was debugging this and it seems to be related to the client from us that is used to access our external filesystem. When using S3 as the input it exits. I noticed it still instantiated our PinotFS but that client isn’t instantiated. As a test I instantiated the client when our PinotFS is instantiated and it hangs on exit (using S3 as input still). Digging deeper shows me that once we instantiate a NettyChannel:
Copy code
NettyChannelBuilder b = ...
b.build()
once that happens (and not even using it) the program will not exit. Maybe this is due to different version of Netty being used in the same application (Pinot’s version and our version, which is older)?
h
Just to double confirm: 1/ did your code wait on the channel to be closed? 2/ did you start the channel on a separate thread or thread pool and did you close the thread or thread pool? Either will prevent the process from shutting donw