Azri Jamil
07/04/2021, 5:31 AMexecutionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndUriPush
inputDirURI: '<gs://mdm-datalake/ais/sentences/>'
outputDirURI: '/tmp/ais-pinot/sentences/'
includeFileNamePattern: 'glob:**/**.parquet'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
- scheme: gs
className: org.apache.pinot.plugin.filesystem.GcsPinotFS
configs:
projectId: 'aton-analytics'
gcpKey: '/var/pinot/controller/config/gcs-datalake-key.json'
recordReaderSpec:
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
tableName: 'sentence'
pinotClusterSpecs:
- controllerURI: '<http://localhost:9000>'
Ken Krugler
07/04/2021, 8:20 PMincludeFileNamePattern: 'glob:**/**.parquet'
. I think it should be includeFileNamePattern: 'glob:**/*.parquet'
Azri Jamil
07/05/2021, 1:35 AMAzri Jamil
07/05/2021, 1:36 AMKen Krugler
07/05/2021, 5:28 PM'<gs://mdm-datalake/ais/sentences/>'
And these files match the *.parquet
pattern.Azri Jamil
07/15/2021, 12:52 PM