Any update on this?
# troubleshooting
s
Any update on this?
x
what’s the issue?
x
I tried with your setup and it works
this is my ingestion config:
Copy code
➜ cat examples/batch/jsontype/ingestionJobSpec.yaml
# executionFrameworkSpec: Defines ingestion jobs to be running.
executionFrameworkSpec:

  # name: execution framework name
  name: 'standalone'

  # segmentGenerationJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentGenerationJobRunner interface.
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'

  # segmentTarPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentTarPushJobRunner interface.
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'

  # segmentUriPushJobRunnerClassName: class name implements org.apache.pinot.spi.batch.ingestion.runner.SegmentUriPushJobRunner interface.
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'

# jobType: Pinot ingestion job type.
# Supported job types are:
#   'SegmentCreation'
#   'SegmentTarPush'
#   'SegmentUriPush'
#   'SegmentCreationAndTarPush'
#   'SegmentCreationAndUriPush'
jobType: SegmentCreationAndTarPush

# inputDirURI: Root directory of input data, expected to have scheme configured in PinotFS.
inputDirURI: 'examples/batch/jsontype/rawdata'

# includeFileNamePattern: include file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will include all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will include all the avro files under inputDirURI recursively.
includeFileNamePattern: 'glob:**/*.json'

# excludeFileNamePattern: exclude file name pattern, supported glob pattern.
# Sample usage:
#   'glob:*.avro' will exclude all avro files just under the inputDirURI, not sub directories;
#   'glob:**/*.avro' will exclude all the avro files under inputDirURI recursively.
# _excludeFileNamePattern: ''

# outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
outputDirURI: 'examples/batch/jsontype/segments'

# overwriteOutput: Overwrite output segments if existed.
overwriteOutput: true

# pinotFSSpecs: defines all related Pinot file systems.
pinotFSSpecs:

  - # scheme: used to identify a PinotFS.
    # E.g. local, hdfs, dbfs, etc
    scheme: file

    # className: Class name used to create the PinotFS instance.
    # E.g.
    #   org.apache.pinot.spi.filesystem.LocalPinotFS is used for local filesystem
    #   org.apache.pinot.plugin.filesystem.AzurePinotFS is used for Azure Data Lake
    #   org.apache.pinot.plugin.filesystem.HadoopPinotFS is used for HDFS
    className: org.apache.pinot.spi.filesystem.LocalPinotFS

# recordReaderSpec: defines all record reader
recordReaderSpec:

  # dataFormat: Record data format, e.g. 'avro', 'parquet', 'orc', 'csv', 'json', 'thrift' etc.
  dataFormat: 'json'

  # className: Corresponding RecordReader class name.
  # E.g.
  #   org.apache.pinot.plugin.inputformat.avro.AvroRecordReader
  #   org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
  #   org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
  #   org.apache.pinot.plugin.inputformat.json.JSONRecordReader
  #   org.apache.pinot.plugin.inputformat.orc.ORCRecordReader
  #   org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReader
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'

  # configClassName: Corresponding RecordReaderConfig class name, it's mandatory for CSV and Thrift file format.
  # E.g.
  #    org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig
  #    org.apache.pinot.plugin.inputformat.thrift.ThriftRecordReaderConfig
  configClassName:

  # configs: Used to init RecordReaderConfig class name, this config is required for CSV and Thrift data format.
  configs:


# tableSpec: defines table name and where to fetch corresponding table config and table schema.
tableSpec:

  # tableName: Table name
  tableName: 'myTable'

  # schemaURI: defines where to read the table schema, supports PinotFS or HTTP.
  # E.g.
  #   <hdfs://path/to/table_schema.json>
  #   <http://localhost:9000/tables/myTable/schema>
  schemaURI: '<http://localhost:9000/tables/myTable/schema>'

  # tableConfigURI: defines where to reade the table config.
  # Supports using PinotFS or HTTP.
  # E.g.
  #   <hdfs://path/to/table_config.json>
  #   <http://localhost:9000/tables/myTable>
  # Note that the API to read Pinot table config directly from pinot controller contains a JSON wrapper.
  # The real table config is the object under the field 'OFFLINE'.
  tableConfigURI: '<http://localhost:9000/tables/myTable>'

# pinotClusterSpecs: defines the Pinot Cluster Access Point.
pinotClusterSpecs:
  - # controllerURI: used to fetch table/schema information and data push.
    # E.g. <http://localhost:9000>
    controllerURI: '<http://localhost:9000>'

# pushJobSpec: defines segment push job related configuration.
pushJobSpec:

  # pushAttempts: number of attempts for push job, default is 1, which means no retry.
  pushAttempts: 2

  # pushRetryIntervalMillis: retry wait Ms, default to 1 second.
  pushRetryIntervalMillis: 1000
This is the job log:
Copy code
➜ bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile examples/batch/jsontype/ingestionJobSpec.yaml

SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
cleanUpOutputDir: false
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
  segmentMetadataPushJobRunnerClassName: null, segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**/*.json
inputDirURI: examples/batch/jsontype/rawdata
jobType: SegmentCreationAndTarPush
outputDirURI: examples/batch/jsontype/segments
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: '<http://localhost:9000>'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000,
  segmentUriPrefix: null, segmentUriSuffix: null}
recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.json.JSONRecordReader,
  configClassName: null, configs: null, dataFormat: json}
segmentCreationJobParallelism: 0
segmentNameGeneratorSpec: null
tableSpec: {schemaURI: '<http://localhost:9000/tables/myTable/schema>', tableConfigURI: '<http://localhost:9000/tables/myTable>',
  tableName: myTable}
tlsSpec: null

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 16.)
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Submitting one Segment Generation Task for file:/Users/xiangfu/workspace/pinot-dev/pinot-distribution/target/apache-pinot-incubating-0.8.0-SNAPSHOT-bin/apache-pinot-incubating-0.8.0-SNAPSHOT-bin/examples/batch/jsontype/rawdata/data.json
Initialized FunctionRegistry with 119 functions: [fromepochminutesbucket, arrayunionint, codepoint, mod, sha256, year, yearofweek, upper, arraycontainsstring, arraydistinctstring, bytestohex, tojsonmapstr, trim, timezoneminute, sqrt, togeometry, normalize, fromepochdays, arraydistinctint, exp, jsonpathlong, yow, toepochhoursrounded, lower, toutf8, concat, ceil, todatetime, jsonpathstring, substr, dayofyear, contains, jsonpatharray, arrayindexofint, fromepochhoursbucket, arrayindexofstring, minus, arrayunionstring, toepochhours, toepochdaysrounded, millisecond, fromepochhours, arrayreversestring, dow, doy, min, toepochsecondsrounded, strpos, jsonpath, tosphericalgeography, fromepochsecondsbucket, max, reverse, hammingdistance, stpoint, abs, timezonehour, toepochseconds, arrayconcatint, quarter, md5, ln, toepochminutes, arraysortstring, replace, strrpos, jsonpathdouble, stastext, second, arraysortint, split, fromepochdaysbucket, lpad, day, toepochminutesrounded, fromdatetime, fromepochseconds, arrayconcatstring, base64encode, ltrim, arraysliceint, chr, sha, plus, base64decode, month, arraycontainsint, toepochminutesbucket, startswith, week, jsonformat, sha512, arrayslicestring, fromepochminutes, remove, dayofmonth, times, hour, rpad, arrayremovestring, now, divide, bigdecimaltobytes, floor, toepochsecondsbucket, toepochdaysbucket, hextobytes, rtrim, length, toepochhoursbucket, bytestobigdecimal, toepochdays, arrayreverseint, datetrunc, minute, round, dayofweek, arrayremoveint, weekofyear] in 733ms
Using class: org.apache.pinot.plugin.inputformat.json.JSONRecordReader to read segment, ignoring configured file format: AVRO
Finished building StatsCollector!
Collected stats for 4 documents
Using fixed length dictionary for column: subjects_grade, size: 20
Created dictionary for STRING column: subjects_grade with cardinality: 5, max length in bytes: 4, range: A to B--
Using fixed length dictionary for column: subjects_name, size: 5
Created dictionary for STRING column: subjects_name with cardinality: 1, max length in bytes: 5, range: maths to maths
Using fixed length dictionary for column: name, size: 20
Created dictionary for STRING column: name with cardinality: 4, max length in bytes: 5, range: Pete to Pete3
Created dictionary for LONG column: age with cardinality: 4, range: 23 to 26
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0 to v3 format
v3 segment location for segment: myTable_OFFLINE_0 is /var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0/v3
Deleting files in v1 segment directory: /var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0
Computed crc = 3500070607, based on files [/var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0/v3/columns.psf, /var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0/v3/index_map, /var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0/v3/metadata.properties]
Driver, record read time : 3
Driver, stats collector time : 0
Driver, indexing time : 12
Tarring segment from: /var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0 to: /var/folders/kp/v8smb2f11tg6q2grpwkq7qnh0000gn/T/pinot-4226d743-ee31-417a-806a-2c4752a21343/output/myTable_OFFLINE_0.tar.gz
Size for segment: myTable_OFFLINE_0, uncompressed: 5.87K, compressed: 1.62K
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Start pushing segments: [/Users/xiangfu/workspace/pinot-dev/pinot-distribution/target/apache-pinot-incubating-0.8.0-SNAPSHOT-bin/apache-pinot-incubating-0.8.0-SNAPSHOT-bin/examples/batch/jsontype/segments/myTable_OFFLINE_0.tar.gz]... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@6304101a] for table myTable
Pushing segment: myTable_OFFLINE_0 to location: <http://localhost:9000> for table myTable
Sending request: <http://localhost:9000/v2/segments?tableName=myTable> to controller: 192.168.86.73, version: Unknown
Response for pushing table myTable segment myTable_OFFLINE_0 to location <http://localhost:9000> - 200: {"status":"Successfully uploaded segment: myTable_OFFLINE_0 of table: myTable"}
image.png
here are table/schema/ingestionSpec I’m using