Abdullah Jaffer
07/03/2022, 2:34 PM"dateTimeFieldSpecs": [
{
"name": "orderingDate",
"dataType": "STRING",
"format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
"granularity": "1:DAYS"
}
]
Table specs
{
"tableName": "sales_by_order_table",
"segmentsConfig": {
"timeColumnName": "orderingDate",
"timeType": "DAYS",
"replication": "1",
"schemaName": "sales_by_order"
},
"tableIndexConfig": {
"invertedIndexColumns": [],
"loadMode": "MMAP"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableType": "OFFLINE",
"metadata": {}
}
Abdullah Jaffer
07/03/2022, 6:11 PMKen Krugler
07/04/2022, 12:19 AMAbdullah Jaffer
07/04/2022, 5:34 AMKen Krugler
07/04/2022, 3:05 PMAbdullah Jaffer
07/05/2022, 2:34 PMKen Krugler
07/05/2022, 3:20 PMAbdullah Jaffer
07/05/2022, 5:21 PMexecutionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 'bin/data2/'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/data/segments/order_sales/'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
tableName: 'sales_by_order_table'
pinotClusterSpecs:
- controllerURI: '<http://localhost:9000>'
Ken Krugler
07/05/2022, 5:46 PMinputDirURI
field, provide an absolute path to the directory containing your input CSV file. Then in your includeFileNamePattern
field, use glob:*.csv
. Delete/recreate your sales_by_order_table
, and re-run the job. This will help determine if you have additional csv files somewhere on the input path that are being processed, and thus creating extra segments.Abdullah Jaffer
07/05/2022, 11:43 PMsales_by_order_table_OFFLINE_2020-08-22_2022-06-28_0
Ken Krugler
07/06/2022, 1:08 AMsales_by_order_schema_OFFLINE_0
and sales_by_order_table_OFFLINE_0
segment names make no sense to me at all.Abdullah Jaffer
07/06/2022, 1:18 AMsales_by_order_schema_OFFLINE_0
is from a now deleted table, I accidentally added schema to the name, will the table config help?
{
"tableName": "sales_by_order_table",
"segmentsConfig": {
"timeColumnName": "ordering_date",
"timeType": "DAYS",
"replication": "1",
"schemaName": "sales_by_order"
},
"tableIndexConfig": {
"invertedIndexColumns": [],
"loadMode": "MMAP"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableType": "OFFLINE",
"metadata": {}
}
Ken Krugler
07/06/2022, 1:19 AMKen Krugler
07/06/2022, 1:20 AMsales_by_order_table
, and re-run the job…” Did you not delete the table?Abdullah Jaffer
07/06/2022, 1:20 AMKen Krugler
07/06/2022, 1:21 AMAbdullah Jaffer
07/06/2022, 1:24 AM