https://pinot.apache.org/ logo
#getting-started
Title
# getting-started
s

Saoirse Amarteifio

10/11/2021, 5:12 PM
Im running my first batch ingestion job ingestion from S3 parquet files - the task was kicked off and the 8 rows of the input sample are read but then it fails and im not sure what the error message is telling me ... what is the illegal argument in this context? I did not get any closer looking at the source for Segment Name Generator...
Copy code
RecordReader initialized will read a total of 8 records.
at row 0. reading next block
block read in memory in 1 ms. row count = 8
Start building IndexCreator!
Finished records indexing in IndexCreator!
Failed to generate Pinot segment for file - <s3://bucket/samples/data/myData/test.parquet>
java.lang.IllegalArgumentException: null
        at shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-11f8550b9b2881ede4d105416ed970a5dd708463]
        at org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:53) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-11f8550b9b2881ede
4d105416ed970a5dd708463]
Can anyone suggest what illegal thing i am doing from this error message? adding jobSpec in thread...
Copy code
executionFrameworkSpec:
              name: 'standalone'
              segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
              segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
              segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
              segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
            jobType: SegmentCreationAndUriPush
            inputDirURI: 's3://...'
            includeFileNamePattern: 'glob:**/*.parquet'
            outputDirURI: 's3://...'
            overwriteOutput: true
            pinotFSSpecs:
              - scheme: s3
                className: org.apache.pinot.plugin.filesystem.S3PinotFS
                configs:
                  region: 'us-east-1'
            recordReaderSpec:
              dataFormat: 'parquet'
              className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
            tableSpec:
              tableName: 'MY_TABLE'
              schemaURI: '<http://pinot-controller.pinot.svc.cluster.local:9000/tables/MY_TABLE/schema>'
              tableConfigURI: '<http://pinot-controller.pinot.svc.cluster.local:9000/tables/MY_TABLE>'
            pinotClusterSpecs:
              - controllerURI: '<http://pinot-controller.pinot.svc.cluster.local:9000>'
            pushJobSpec:
              pushAttempts: 2
              pushRetryIntervalMillis: 1000
OK - im missing a
segmentNameGeneratorSpec
I realize its helpful to scan through the logs above the error and observe where some parameters are null and sometimes it matters!
👍 2