Dan Hill
06/30/2020, 11:11 PMkubectl get pods --namespace $NAMESPACE
NAME READY STATUS RESTARTS AGE
...
pinot-populate-local-data-hwpdm 0/1 Completed 0 14s
kubectl logs --namespace $NAMESPACE pinot-populate-local-data-hwpdm
...
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
...
kubectl describe --namespace $NAMESPACE pod/pinot-populate-local-data-hwpdm
...
Status: Succeeded
Dan Hill
06/30/2020, 11:12 PM# TODO - is outputDirURI set correctly?
apiVersion: v1
kind: ConfigMap
metadata:
name: pinot-local-data-config
data:
local_batch_job_spec.yaml: |-
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/home/pinot/local-raw-data/'
outputDirURI: '/tmp/metrics/segments/'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'json'
className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
tableName: 'metrics'
schemaURI: '<http://pinot-controller:9000/tables/metrics/schema>'
tableConfigURI: '<http://pinot-controller:9000/tables/metrics>'
pinotClusterSpecs:
- controllerURI: '<http://pinot-controller:9000>'
---
apiVersion: batch/v1
kind: Job
metadata:
name: pinot-populate-local-data
spec:
template:
spec:
containers:
- name: pinot-populate-local-data
image: apachepinot/pinot:0.4.0
args: [ "LaunchDataIngestionJob", "-jobSpecFile", "/home/pinot/pinot-config/local_batch_job_spec.yaml" ]
volumeMounts:
- name: pinot-local-data-config
mountPath: /home/pinot/pinot-config
- name: pinot-local-data
mountPath: /home/pinot/local-raw-data
restartPolicy: OnFailure
volumes:
- name: pinot-local-data-config
configMap:
name: pinot-local-data-config
- name: pinot-local-data
hostPath:
path: /my/local/path
backoffLimit: 100
Dan Hill
06/30/2020, 11:12 PMXiang Fu
Xiang Fu
Dan Hill
07/01/2020, 8:12 AMmetrics_offline_table_config.json: |-
{
"tableName": "metrics",
"tableType":"OFFLINE",
"segmentsConfig" : {
"schemaName" : "metrics",
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "1461",
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"replication" : "1"
},
"tableIndexConfig" : {
"loadMode" : "MMAP",
"noDictionaryColumns": ["impressions"],
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"utc_date",
"platform_id",
"account_id",
"campaign_id"
],
"skipStarNodeCreationForDimensions": [
],
"functionColumnPairs": [
"SUM__impressions",
]
}
]
},
"tenants" : {},
"metadata": {
"customConfigs": {}
}
}
The query I'm running looks pretty basic. It's asking for aggregate stats at a high-level. In my data, there are 8 unique utc_dates and 1 unique platform.
select utc_date, sum(impressions) from metrics where platform_id = 13 group by utc_date
Recent changes:
• switched from timestamp to my own utc_date (long).
• added "noDictionaryColumns": ["impressions"],
This previously was 50ms-100ms.
I'm going to bed now. No need to rush an answer.Dan Hill
07/01/2020, 3:51 PMKishore G
Kishore G
Dan Hill
07/01/2020, 3:54 PMKishore G
Kishore G
Kishore G
Dan Hill
07/01/2020, 3:55 PMPradeep
07/01/2020, 11:44 PM{
"exceptions": [],
"numServersQueried": 2,
"numServersResponded": 0,
"numSegmentsQueried": 0,
"numSegmentsProcessed": 0,
"numSegmentsMatched": 0,
"numConsumingSegmentsQueried": 0,
"numDocsScanned": 0,
"numEntriesScannedInFilter": 0,
"numEntriesScannedPostFilter": 0,
"numGroupsLimitReached": false,
"totalDocs": 0,
"timeUsedMs": 9999,
"segmentStatistics": [],
"traceInfo": {},
"minConsumingFreshnessTimeMs": 0
}
Close to ~34 segments and all of them seem to be in either “ONLINE” or “CONSUMING” state
I just see a timeout exception on one of the server logs
Caught TimeoutException. (brokerRequest = BrokerRequest(querySource:QuerySource(tableName:searchtable_REALTIME), selections:Selection(se
lectionColumns:[*], selectionSortSequence:[SelectionSort(column:timestampMillis, isAsc:true)], size:10), enableTrace:true, queryOptions:{re
sponseFormat=sql, groupByMode=sql, timeoutMs=10000}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:searchtable), selectList:[Exp
ression(type:IDENTIFIER, identifier:Identifier(name:*))], orderByList:[Expression(type:FUNCTION, functionCall:Function(operator:ASC, operan
ds:[Expression(type:IDENTIFIER, identifier:Identifier(name:timestampMillis))]))], limit:10), orderBy:[SelectionSort(column:timestampMillis,
isAsc:true)], limit:10))
java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_252]
at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:169) ~[pinot-all-0.4.0-jar-with-dependencies.ja
r:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:47) ~[pinot-all-0.4.0-jar-with-dependencies.jar
:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-83
55d2e0e489a8d127f2e32793671fba505628a8]
Wondering if there is a way to improve the query latency? (tried with small subset of columns, query retunrs results)Kishore G
Kishore G
Pradeep
07/02/2020, 12:54 AMKishore G
Kishore G
Jackie
07/02/2020, 2:02 AMPradeep
07/02/2020, 2:09 AMPradeep
07/02/2020, 2:09 AMPradeep
07/02/2020, 2:10 AMcolumnMinMaxValueGeneratorMode: TIME
Kishore G
Kishore G
Kishore G
Pradeep
07/02/2020, 7:33 AM{
"REALTIME": {
"tableName": "tablename_REALTIME",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "timestampMillis",
"schemaName": "search",
"timeType": "MILLISECONDS",
"replicasPerPartition": "1"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"loadMode": "MMAP",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "LowLevel",
"stream.kafka.topic.name": "INPUT",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "<broker_nodes>:9092",
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "80M",
"realtime.segment.flush.autotune.initialRows": "700000",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest"
},
"noDictionaryColumns": [
"timestampMillis"
],
"enableDefaultStarTree": false,
"aggregateMetrics": false,
"nullHandlingEnabled": true
},
"metadata": {
"customConfigs": {}
}
}
}
I am seeing this Nullptr exception, works fine when I choose a different string column. noDictionaryColumns should only contain string/bytes fields?
Could not build segment
java.lang.NullPointerException: null
at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:393) ~[pin
ot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:360) ~[pinot-all-0.
4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java
:216) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:199) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
at org.apache.pinot.core.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:141) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
Xiang Fu
Xiang Fu