https://pinot.apache.org/ logo
m

Mateus Oliveira

06/16/2021, 7:53 PM
Hello team, need helo with something, I'm trying to load some data from S3 bucket into Pinot but is give me this error
Copy code
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.)
Listed 8 files from URI: <s3://landing/bank/>, is recursive: true
Got exception to kick off standalone data ingestion job -
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:166) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:186) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
Caused by: java.lang.IllegalArgumentException
	at sun.nio.fs.UnixFileSystem.getPathMatcher(UnixFileSystem.java:288) ~[?:1.8.0_292]
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:175) ~[pinot-batch-ingestion-standalone-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435]
	... 4 more
this is my job
Copy code
executionFrameworkSpec:
    name: 'standalone'
    segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
    segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
    segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '<s3://landing/bank/>'
includeFileNamePattern: '*.json'
outputDirURI: '<s3://pinot/>'
overwriteOutput: true
pinotFSSpecs:
    - scheme: s3
      className: org.apache.pinot.plugin.filesystem.S3PinotFS
      configs:
        region: 'us-east-1'
        endpoint: '<http://10.0.220.205:9000>'
        accessKey: 'access'
        secretKey: 'key'
recordReaderSpec:
    dataFormat: 'json'
    className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
    tableName: 'bank'
pinotClusterSpecs:
    - controllerURI: '<http://localhost:9000>'
a

Aaron Wishnick

06/16/2021, 8:02 PM
Try
includeFileNamePattern: 'glob:**/*.json'
m

Mayank

06/16/2021, 8:08 PM
Yeah ^^. Seems it is failing here in the code:
Copy code
if (_spec.getIncludeFileNamePattern() != null) {
      includeFilePathMatcher = FileSystems.getDefault().getPathMatcher(_spec.getIncludeFileNamePattern());
    }
m

Mateus Oliveira

06/16/2021, 8:10 PM
not receive any error anymore, but he is not create segments
Copy code
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.)
Listed 8 files from URI: <s3://landing/bank/>, is recursive: true
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
Listed 0 files from URI: <s3://pinot/>, is recursive: true
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@106cc338] for table bank
x

Xiang Fu

06/16/2021, 8:23 PM
Can you try Aaron’s suggestion?
Try 
includeFileNamePattern: 'glob:**/*.json'
I feel the pattern doesn’t match any file
m

Mateus Oliveira

06/16/2021, 8:24 PM
sure, I try and I have no more errors, but is not creating segments
can be, I will take a look into the files
x

Xiang Fu

06/16/2021, 8:24 PM
ic
what’s your file names/paths?
m

Mateus Oliveira

06/16/2021, 8:26 PM
Copy code
bank_2021_5_19_11_33_43.json
x

Xiang Fu

06/16/2021, 8:27 PM
hmm
m

Mateus Oliveira

06/16/2021, 8:27 PM
he even reads the 8 files as log message shows but is weird
x

Xiang Fu

06/16/2021, 8:27 PM
have you set this
Copy code
schemaURI: '<http://localhost:9000/tables/bank/schema>'
  tableConfigURI: '<http://localhost:9000/tables/bank>'
under
tableSpec:
m

Mateus Oliveira

06/16/2021, 8:28 PM
no but I will do it now
nothing, dont create the segments and the table is empty, I will review the schema, maybe is something related with that
Copy code
SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
authToken: null
cleanUpOutputDir: false
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentMetadataPushJobRunnerClassName: null, segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
failOnEmptySegment: false
includeFileNamePattern: glob:*.json
inputDirURI: <s3://landing/bank/>
jobType: SegmentCreationAndTarPush
outputDirURI: <s3://pinot/>
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: '<http://localhost:9000>'}
pinotFSSpecs:
- className: org.apache.pinot.plugin.filesystem.S3PinotFS
  configs: {region: us-east-1, endpoint: '<http://10.0.220.205:9000>', accessKey: YOURACCESSKEY,
    secretKey: YOURSECRETKEY}
  scheme: s3
pushJobSpec: null
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.json.JSONRecordReader,
  configClassName: null, configs: null, dataFormat: json}
segmentCreationJobParallelism: 0
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: '<http://localhost:9000/tables/bank/schema>', tableConfigURI: '<http://localhost:9000/tables/bank>',
  tableName: bank}
tlsSpec: null

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.)
Listed 8 files from URI: <s3://landing/bank/>, is recursive: true
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
Listed 0 files from URI: <s3://pinot/>, is recursive: true
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@63f259c3] for table bank
root@pinot-controller-0:/opt/pinot#
the output of job execution
x

Xiang Fu

06/16/2021, 8:33 PM
hmm, ok
Copy code
includeFileNamePattern: glob:*.json
'glob:**/*.json'
not
'glob:*.json'
m

Mateus Oliveira

06/16/2021, 8:35 PM
work @Xiang Fu! thanks you and @Aaron Wishnick for the help
m

Mayank

06/16/2021, 8:36 PM
@Mateus Oliveira curious, was this a documentation issue (as in was it not clear enough)?
If so, any suggestions on how to improve it?
m

Mateus Oliveira

06/16/2021, 8:40 PM
In this case was my mistake but if you guys could detail a little more the configs for example this part of pattern wasnt in the file document, at least not in the s3, maybe even repeat a little this info will be great, but besides was not a documentation problem, was my mistake
m

Mayank

06/16/2021, 8:41 PM
I see, thanks
k

Kulbir Nijjer

06/16/2021, 8:44 PM
@Mateus Oliveira btw endpoint is AWS S3 specific client config not Pinot controller address,so current setting is invalid (AWS SDK probably overriding it automatically based on region), u r probably fine not specifying it at all
Copy code
endpoint: '<http://10.0.220.205:9000>'
In case u interested about valid values: https://docs.aws.amazon.com/general/latest/gr/s3.html
x

Xiang Fu

06/16/2021, 9:16 PM
it might be a different s3 compatible fs endpoint, like minio?
k

Kulbir Nijjer

06/16/2021, 10:38 PM
Yes good pt, it can be depending on object backend that you are integrating with. Generally for AWS S3 access , its only needed for advanced use cases.