Kha
02/05/2021, 9:41 PMoffline_table_config.json
and a schema.json
file to Pinot, however creating a segment doesn't appear to be working. A SEGMENT-NAME.tar.gz
file isn't being created.
My current docker-job-spec.yml looks like this:
# docker-job-spec.yml
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/tmp/pinot-manual-test/rawdata/100k'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/tmp/pinot-manual-test/segments/100k'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
tableName: 'rows_100k'
schemaURI: '<http://pinot-controller-test:9000/tables/rows_100k/schema>'
tableConfigURI: '<http://pinot-controller-test:9000/tables/rows_100k>'
pinotClusterSpecs:
- controllerURI: '<http://pinot-controller-test:9000>'
Some of the error messages I'm getting are
Failed to generate Pinot segment for file - file:/tmp/pinot-manual-test/rawdata/100k/rows_100k.csv
Caught exception while gathering stats
java.lang.NumberFormatException: For input string: "5842432235322161941"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_282]
at java.lang.Integer.parseInt(Integer.java:583) ~[?:1.8.0_282]
Any leads on this would be appreciated. ThanksNeha Pawar
Ken Krugler
02/05/2021, 10:12 PMXiang Fu
Kha
02/05/2021, 10:27 PM1096429682806
Schema for date is:
# schema.json
"dateTimeFieldSpecs": [{
"name": "date",
"dataType": "LONG",
"format" : "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}]
Table config is:
"segmentsConfig": {
"timeColumnName": "date",
"timeType": "MILLISECONDS",
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "row1",
"replication": "1"
},
Error in the image attached:Xiang Fu
1096429682806
<- this value is in the year 2004?Kha
02/05/2021, 10:33 PMXiang Fu
Neha Pawar
1096429682806
is the value for 2004 rt? the error says Pinot found 1612471718
which is 1970Kha
02/05/2021, 10:44 PMNeha Pawar
Xiang Fu
Kha
02/05/2021, 10:45 PMNeha Pawar
Kha
02/05/2021, 11:00 PMNeha Pawar
tableName: 'rows_100k'
schemaURI: '<http://pinot-controller-test:9000/tables/rows_100k/schema>'
tableConfigURI: '<http://pinot-controller-test:9000/tables/rows_100k>'
Kha
02/05/2021, 11:08 PMNeha Pawar
Kha
02/05/2021, 11:09 PMtableName
and schemaName
Xiang Fu
schemaURI: '<http://pinot-controller-test:9000/tables/rows_100k/schema>'
schemaURI: '<http://pinot-controller-test:9000/tables/rows_100k/schema>'
tableConfigURI: '<http://pinot-controller-test:9000/tables/rows_100k>'
Kha
02/05/2021, 11:13 PMrows_100k
to row1
breaks it furtherXiang Fu
Neha Pawar
Xiang Fu
apachepinot/pinot:latest
Kha
02/05/2021, 11:21 PMXiang Fu
{
tableName: "foo",
tableType: "OFFLINE",
segmentsConfig: {
timeColumnName: "date",
timeType: "MILLISECONDS",
replication: "1"
},
tenants: {},
tableIndexConfig: {
loadMode: "HEAP",
invertedIndexColumns: [
"id",
"hash_one"
]
},
metadata: {
customConfigs: {}
}
}
Kha
02/05/2021, 11:27 PMNeha Pawar
Xiang Fu
docker log <docker-container-id>
Kha
02/05/2021, 11:36 PMXiang Fu
Neha Pawar
s/pinot-controller-test/localhost
, and was able to uploadXiang Fu
docker run \
--network=pinot-demo \
--name pinot-quickstart \
-p 9000:9000 \
-d apachepinot/pinot:latest QuickStart \
-type batch
2. Create Table
docker run --rm -ti \
--network=pinot-demo \
-v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
--name pinot-batch-table-creation \
apachepinot/pinot:latest AddTable \
-schemaFile /tmp/pinot-quick-start/foo-schema.json \
-tableConfigFile /tmp/pinot-quick-start/foo-table-offline.json \
-controllerHost pinot-quickstart \
-controllerPort 9000 -exec
3. Start Ingestion job
docker run --rm -ti \
--network=pinot-demo \
-v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
--name pinot-data-ingestion-job \
apachepinot/pinot:latest LaunchDataIngestionJob \
-jobSpecFile /tmp/pinot-quick-start/docker-job-spec-100k.yml
Neha Pawar
Xiang Fu
➜ cat /tmp/pinot-quick-start/docker-job-spec-100k.yml
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/tmp/pinot-quick-start/rawdata'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/tmp/pinot-manual-test/segments'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
tableName: 'foo'
schemaURI: '<http://pinot-quickstart:9000/tables/foo/schema>'
tableConfigURI: '<http://pinot-quickstart:9000/tables/foo>'
pinotClusterSpecs:
- controllerURI: '<http://pinot-quickstart:9000>'
Kha
02/08/2021, 7:56 PM