Thomas Steinholz
09/28/2022, 3:26 PMRealtimeToOfflineSegmentsTask
running… I’ve been following the guide, added the task config, but the task stays at the status of NOT_STARTED
with a {}
task config in the task view giving a 404 error
when trying to run. Any idea what is not correctly configured?Mayank
Neha Pawar
Thomas Steinholz
09/28/2022, 3:40 PMInitialized TaskExecutorFactoryRegistry with 5 task executor factories: [MergeRollupTask, RealtimeToOfflineSegmentsTask, ConvertToRawIndexTask, PurgeTask, SegmentGenerationAndPushTask] in 1132ms
Registering RealtimeToOfflineSegmentsTask with task executor factory: RealtimeToOfflineSegmentsTaskExecutorFactory, event observer factory: DefaultMinionEventObserverFactory
Thomas Steinholz
09/28/2022, 3:42 PMThomas Steinholz
09/28/2022, 4:02 PMThomas Steinholz
09/28/2022, 4:03 PMThomas Steinholz
09/28/2022, 4:04 PMThomas Steinholz
09/28/2022, 4:08 PMtask
object is defined in the tableXiaobing
09/28/2022, 4:11 PMXiaobing
09/28/2022, 4:13 PMThomas Steinholz
09/28/2022, 4:15 PMTrying to schedule task type: RealtimeToOfflineSegmentsTask, isLeader: false
Start generating task configs for table: uplinkpayloadevent_REALTIME for task: RealtimeToOfflineSegmentsTask
No realtime-completed segments found for table: uplinkpayloadevent_REALTIME, skipping task generation: RealtimeToOfflineSegmentsTask
but my table has also reached its realtime.segment.flush.segment.size
so I am surprised it is not considered “completed”Thomas Steinholz
09/28/2022, 4:17 PMXiaobing
09/28/2022, 4:21 PMThomas Steinholz
09/28/2022, 4:22 PMXiaobing
09/28/2022, 4:23 PMThomas Steinholz
09/28/2022, 4:25 PMThomas Steinholz
09/28/2022, 4:25 PM{
"segment.realtime.numReplicas": "1",
"segment.creation.time": "1664297019057",
"segment.flush.threshold.size": "100000",
"segment.realtime.startOffset": "0",
"segment.realtime.status": "IN_PROGRESS"
}
where that segment currently contains (over) 100000 recordsXiaobing
09/28/2022, 4:27 PMThomas Steinholz
09/28/2022, 4:28 PM{
"REALTIME": {
"tableName": "uplinkpayloadevent_REALTIME",
"tableType": "REALTIME",
"segmentsConfig": {
"schemaName": "uplinkpayloadevent",
"replication": "1",
"replicasPerPartition": "1",
"timeColumnName": "time_string",
"minimizeDataMovement": false
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {}
},
"tableIndexConfig": {
"invertedIndexColumns": [],
"noDictionaryColumns": [],
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "<kafka topic>",
"stream.kafka.broker.list": "<kafka servers>",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "1m",
"realtime.segment.flush.segment.size": "100K"
},
"rangeIndexColumns": [
"key_range"
],
"rangeIndexVersion": 2,
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"sortedColumn": [
"app_tok",
"moduleaddress"
],
"bloomFilterColumns": [
"key_hash"
],
"loadMode": "MMAP",
"onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [],
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false,
"optimizeDictionaryForMetrics": false,
"noDictionarySizeRatioThreshold": 0
},
"metadata": {},
"quota": {},
"task": {
"taskTypeConfigsMap": {
"RealtimeToOfflineSegmentsTask": {
"bufferTimePeriod": "2h",
"bucketTimePeriod": "24h",
"roundBucketTimePeriod": "1m",
"mergeType": "dedup",
"maxNumRecordsPerSegment": "1000000",
"schedule": "0 * * * * ?"
}
}
},
"routing": {},
"query": {},
"fieldConfigList": [],
"ingestionConfig": {},
"isDimTable": false
}
}
Thomas Steinholz
09/28/2022, 4:32 PMThomas Steinholz
09/28/2022, 4:34 PMHandled request from <ip> POST http://<url>/tables/uplinkpayloadevent/forceCommit, content-type null status code 200 OK
85 START: CallbackHandler 23, INVOKE /pinot-dev/INSTANCES/Server_pinot-server-0.pinot-server-headless.datalake.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.controller.GenericHelixController@6e829e50 type: CALLBACK
CallbackHandler 23 subscribing changes listener to path: /pinot-dev/INSTANCES/Server_pinot-server-0.pinot-server-headless.datalake.svc.cluster.local_8098/MESSAGES, callback type: CALLBACK, event types: [NodeChildrenChanged], listener: org.apache.helix.controller.GenericHelixController@6e829e50, watchChild: false
CallbackHandler23, Subscribing to path: /pinot-dev/INSTANCES/Server_pinot-server-0.pinot-server-headless.datalake.svc.cluster.local_8098/MESSAGES took: 0
Neha Pawar
Xiaobing
09/28/2022, 4:44 PMXiaobing
09/28/2022, 4:46 PM"realtime.segment.flush.threshold.rows": "1000",
Thomas Steinholz
09/28/2022, 5:54 PMThomas Steinholz
09/28/2022, 5:56 PMThomas Steinholz
09/28/2022, 6:05 PMThomas Steinholz
09/28/2022, 6:10 PMXiaobing
09/28/2022, 7:23 PMThomas Steinholz
09/28/2022, 8:49 PMrealtime.segment.flush.threshold.rows
it essentially freezes ingestion and doesn’t mark the segment as complete, or generate a new segment to continue the real time dataThomas Steinholz
09/28/2022, 8:52 PMXiaobing
09/28/2022, 8:53 PMThomas Steinholz
09/28/2022, 8:55 PMUsing fixed length dictionary for column: app_tok, size: 220
Created dictionary for STRING column: app_tok with cardinality: 11, max length in bytes: 20, range: 082a91eec728d5ababc3 to null
Using fixed length dictionary for column: gatewayaddress, size: 208
Created dictionary for STRING column: gatewayaddress with cardinality: 8, max length in bytes: 26, range: $101$0-0-0-db94abef0 to $101$0-0-0000b82-7ebefd489
Using fixed length dictionary for column: message_str, size: 2639000
Created dictionary for STRING column: message_str with cardinality: 1000, max length in bytes: 2639, range: <DATA> to <data>
Using fixed length dictionary for column: key_hash, size: 24000
Created dictionary for STRING column: key_hash with cardinality: 1000, max length in bytes: 24, range: +/2wCQzSfKCnlIZARRZ5Mw== to zxja3VKp9AD/hdDktP+EMw==
Creating bloom filter with cardinality: 1000, fpp: 0.05
Using fixed length dictionary for column: net_tok, size: 16
Created dictionary for STRING column: net_tok with cardinality: 2, max length in bytes: 8, range: 4f50454e to null
Using fixed length dictionary for column: acctid, size: 20
Created dictionary for STRING column: acctid with cardinality: 5, max length in bytes: 4, range: 2 to null
Using fixed length dictionary for column: id, size: 36000
Created dictionary for STRING column: id with cardinality: 1000, max length in bytes: 36, range: 0003b2a5-6c0d-478a-8f34-96a4517e9955 to fffc7d7f-fac1-4e8c-b0c0-85b2e40f3580
Using fixed length dictionary for column: moduleaddress, size: 5070
Created dictionary for STRING column: moduleaddress with cardinality: 195, max length in bytes: 26, range: $101$0-0-0-db94abef0 to $501$0-0-0000ff2-aaae36fb4
Using fixed length dictionary for column: time_string, size: 23000
Created dictionary for STRING column: time_string with cardinality: 1000, max length in bytes: 23, range: 2022-09-26T14:44:38.480 to 2022-09-26T15:04:55.125
Using fixed length dictionary for column: key_range, size: 60000
Created dictionary for STRING column: key_range with cardinality: 1000, max length in bytes: 60, range: 2022-09-26T14:44:38.480_008ca3de-b00e-4478-bf84-ef371a545e73 to 2022-09-26T15:04:55.125_21c02faf-cdf1-41c8-a48d-fe6a879c23e3
Start building IndexCreator!
Finished records indexing in IndexCreator!
Could not build segment
java.lang.NumberFormatException: For input string: "2022-09-26T14:44:38.480"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:?]
at java.lang.Long.parseLong(Long.java:692) ~[?:?]
at java.lang.Long.parseLong(Long.java:817) ~[?:?]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:742) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:694) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:276) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:248) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:123) ~[pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:873) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:800) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:699) [pinot-all-0.11.0-jar-with-dependencies.jar:0.11.0-1b4d6b6b0a27422c1552ea1a936ad145056f7033]
at java.lang.Thread.run(Thread.java:829) [?:?]
Could not build segment for uplinkpayloadevent__0__0__20220928T1958Z
Thomas Steinholz
09/28/2022, 8:59 PMjava.lang.NumberFormatException: For input string: "2022-09-26T14:44:38.480"
but my time column is configured to be a string in the datetime formatXiaobing
09/28/2022, 9:02 PMThomas Steinholz
09/28/2022, 9:03 PM"dateTimeFieldSpecs": [
{
"name": "time_string",
"dataType": "STRING",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
Thomas Steinholz
09/28/2022, 9:05 PMXiaobing
09/28/2022, 9:05 PMSIMPLE_DATE_FORMAT
as noted here: https://docs.pinot.apache.org/basics/components/schema#date-time-fieldsXiaobing
09/28/2022, 9:06 PM"dateTimeFieldSpecs": [
{
"name": "time_string",
"dataType": "STRING",
"format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS",
"granularity": "1:MILLISECONDS"
}
]
Thomas Steinholz
09/28/2022, 9:07 PMThomas Steinholz
09/28/2022, 9:36 PMXiaobing
09/28/2022, 9:37 PMThomas Steinholz
09/28/2022, 10:19 PMXiaobing
09/28/2022, 10:33 PMthreshold
.segment.size