:thread:Backfill times out- `<http://shaded.org|sh...
# troubleshooting
r
๐ŸงตBackfill times out-
<http://shaded.org|shaded.org>.apache.http.NoHttpResponseException: external_controller_uri:9000 failed to respond
. I've done a backfill job on a new _OFFLINE table twice and in the last part of it, I've twice had a HTTP response error about 7 hours into a job.
Is there any reason other than an internet error? It's hard to believe I've had internet cut out at the same point each time. It catches the error after many successful segment pushes:
Copy code
Response for pushing table mobileEventTable_OFFLINE segment mobileEventTable_OFFLINE_1610437930892_1610438002915_3936 to location <http://ab28eadb3980a4dd3aa7b31e86ae6db7-1491970538.us-east-1.elb.amazonaws.com:9000/> - 200:
m
How much data are you backfilling? Also are you pushing the entire payload (segments) via controller, or doing the metadata+uri push (this will be much faster).
r
I'm expecting 15mm records. About 100k records are successfully making it into the table. I'm not sure which I'm doing:
Copy code
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '<s3://bucket/example-batch-input/landing/year=2021/month=07/day=01/hour=01/>'
includeFileNamePattern: 'glob:**/*.gz'
outputDirURI: '<s3://bucket/pinot-data/pinot-quickstart/batch-output/mobile-event-data/>'
overwriteOutput: true
pinotFSSpecs:
  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:
      region: 'us-east-1'
recordReaderSpec:
  dataFormat: 'json'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
  tableName: 'mobileEventTable_OFFLINE'
  schemaURI: '<http://external_controller_uri:9000/tables/mobileEventTable_OFFLINE/schema>'
  tableConfigURI: '<http://external_controller_uri:9000/tables/mobileEventTable_OFFLINE>'
pinotClusterSpecs:
  - controllerURI: '<http://external_controller_uri:9000>'
What is the step after
Pushing segment
? Maybe it is failing on the next step because it has thousands of successful pushes.
After running some batches, I now have ZK failing because of memory:
Copy code
java.io.IOException: No space left on device
k
@Ryan Clark how many segments are you trying to push via standalone job and what's the size of those segments ?
r
@Kulbir Nijjer I tried a much smaller batch job (maybe 5 million records) and got the same result. Not sure how to check the size of the segments, but here is the yaml. If I do a VERY small batch job I don't get the error.
Copy code
executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '<s3://example/>'
includeFileNamePattern: 'glob:**/*.gz'
outputDirURI: '<s3://example/pinot-data/pinot-quickstart/batch-output/mobile-event-data/>'
overwriteOutput: true
segmentCreationJobParallelism: 4
pinotFSSpecs:
  - scheme: s3
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    configs:
      region: 'us-east-1'
recordReaderSpec:
  dataFormat: 'json'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
  tableName: 'mobileEventTable_OFFLINE'
  schemaURI: 'external_controller_uri:9000/tables/mobileEventTable_OFFLINE/schema'
  tableConfigURI: 'external_controller_uri:9000/tables/mobileEventTable_OFFLINE'
pinotClusterSpecs:
  - controllerURI: 'external_controller_uri:9000'
pushJobSpec:
  pushParallelism: 2
  pushAttempts: 2
  pushRetryIntervalMillis: 5000
It looks like the batch job for 5mm records created about 5500 segments before throwing the HTTP error.
Yeah the segments are tiny. Looking into that. Thanks @Kulbir Nijjer!
m
Typically you want to avoid too many tiny segments or too few huge segments. Good size can range from 100MB-500MB
โœ”๏ธ 2
r
The timeout issue was because I was creating thousands of tiny segments.
m
๐Ÿ‘