vishal
11/04/2022, 9:20 AMStart generating task configs for table: events2_REALTIME for task: RealtimeToOfflineSegmentsTask
No realtime-completed segments found for table: events2_REALTIME, skipping task generation: RealtimeToOfflineSegmentsTask
Finished CronJob: table - events2_REALTIME, task - RealtimeToOfflineSegmentsTask, next runtime is 2022-11-04T07:04:00.000+0000
i've pushed huge number of data and its creating multiple segment but its not converting to realtime to offline table.
Thankssaurabh dubey
11/04/2022, 9:36 AMvishal
11/04/2022, 9:36 AMvishal
11/04/2022, 9:37 AMsaurabh dubey
11/04/2022, 9:41 AM"segment.realtime.status": "DONE",
Example:saurabh dubey
11/04/2022, 10:06 AMvishal
11/04/2022, 10:57 AMvishal
11/04/2022, 10:57 AMsaurabh dubey
11/04/2022, 11:23 AMNo realtime-completed segments found for table: events2_REALTIME, skipping task generation: RealtimeToOfflineSegmentsTask
? You should be checking logs on both controller and minions
Also what pinot version are you on?vishal
11/04/2022, 11:34 AMvishal
11/04/2022, 11:38 AMvishal
11/04/2022, 11:41 AM{
"tableName": "events3",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "ts",
"schemaName": "events3",
"replication": "1",
"replicasPerPartition": "1",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "1"
},
"task": {
"taskTypeConfigsMap": {
"RealtimeToOfflineSegmentsTask": {
"bufferTimePeriod": "1m",
"bucketTimePeriod": "5m",
"roundBucketTimePeriod": "1m",
"schedule": "0 * * * * ?",
"mergeType": "rollup",
"count.aggregationType": "max",
"maxNumRecordsPerSegment": "1000"
}
}
},
"tableIndexConfig": {
"invertedIndexColumns": [],
"noDictionaryColumns": [],
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "test",
"stream.kafka.broker.list": "SERVERS",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "largest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "1h",
"realtime.segment.flush.segment.size": "1M"
},
"createInvertedIndexDuringSegmentGeneration": false,
"rangeIndexColumns": [],
"rangeIndexVersion": 2,
"autoGeneratedInvertedIndex": false,
"sortedColumn": [],
"bloomFilterColumns": [],
"loadMode": "MMAP",
"onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [],
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false
},
"tenants": {},
"metadata": {}
}
vishal
11/04/2022, 11:41 AMsaurabh dubey
11/04/2022, 11:42 AMevents3
? Log seems to be for events2_REALTIME?vishal
11/04/2022, 11:43 AMsaurabh dubey
11/04/2022, 11:45 AMvishal
11/04/2022, 11:46 AMvishal
11/04/2022, 11:48 AMsaurabh dubey
11/04/2022, 11:48 AMvishal
11/04/2022, 11:49 AMsaurabh dubey
11/04/2022, 11:51 AM/tasks/schedule
and verify if the controller still shows the same logs. Just in case the logs you shared were probably from a time when the segments were still in CONSUMING state. But if that's still the case, might need more investigation.vishal
11/04/2022, 11:53 AMvishal
11/04/2022, 11:56 AMStart generating task configs for table: events3_REALTIME for task: RealtimeToOfflineSegmentsTask
No realtime-completed segments found for table: events3_REALTIME, skipping task generation: RealtimeToOfflineSegmentsTask
vishal
11/04/2022, 11:56 AMvishal
11/04/2022, 12:03 PMvishal
11/04/2022, 12:03 PMsaurabh dubey
11/04/2022, 12:04 PMvishal
11/04/2022, 12:06 PMvishal
11/04/2022, 12:12 PMStart generating task configs for table: events3_REALTIME for task: RealtimeToOfflineSegmentsTask
Window data overflows into CONSUMING segments for partition of segment: events3__0__2__20221104T0918Z. Skipping task generation: RealtimeToOfflineSegmentsTask
Finished CronJob: table - events3_REALTIME, task - RealtimeToOfflineSegmentsTask, next runtime is 2022-11-04T12:13:00.000+0000
vishal
11/04/2022, 12:13 PMvishal
11/04/2022, 12:14 PM{
"id": "events3__0__2__20221104T0918Z",
"simpleFields": {
"segment.crc": "4243541523",
"segment.creation.time": "1667553508578",
"segment.download.url": "URL",
"segment.end.time": "1667374708937",
"segment.flush.threshold.size": "54547",
"segment.index.version": "v3",
"segment.realtime.endOffset": "762871",
"segment.realtime.numReplicas": "1",
"segment.realtime.startOffset": "708324",
"segment.realtime.status": "DONE",
"segment.start.time": "1667374708937",
"segment.time.unit": "MILLISECONDS",
"segment.total.docs": "54547"
},
"mapFields": {},
"listFields": {}
}
vishal
11/04/2022, 12:14 PMsaurabh dubey
11/04/2022, 12:21 PMsaurabh dubey
11/04/2022, 12:22 PM"bufferTimePeriod": "1m",
"bucketTimePeriod": "5m",
settings. These anyway don't look like production settings to me?
@Neha Pawar and @Mark Needham for more?Neha Pawar
ts
? can you do a select min(ts), max(ts), $segmentName from table group by $segmentName
?the entire window of [min timestamp, min timestamp + 5m] should fall in the completed segments. None of the consuming segments should have any values from that range. Only then will the code think it’s safe to process that window. That’s because consuming segments aren’t persisted, so we cannot start using that data to move to offlinevishal
11/07/2022, 7:28 AM"dateTimeFieldSpecs": [{
"name": "ts",
"dataType": "TIMESTAMP",
"format" : "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}]
vishal
11/07/2022, 7:32 AMselect min(ts), max(ts), $segmentName from events3 group by $segmentName
vishal
11/07/2022, 7:32 AMvishal
11/07/2022, 7:35 AMSchemas:
{
"schemaName": "events3",
"dimensionFieldSpecs": [
{
"name": "uuid",
"dataType": "STRING"
}
],
"metricFieldSpecs": [
{
"name": "count",
"dataType": "INT"
}
],
"dateTimeFieldSpecs": [{
"name": "ts",
"dataType": "TIMESTAMP",
"format" : "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}]
}
Offline
{
"tableName": "events3",
"tableType": "OFFLINE",
"segmentsConfig": {
"timeColumnName": "ts",
"schemaName": "events3",
"replication": "1",
"replicasPerPartition": "1"
},
"ingestionConfig": {
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "HOURLY"
}
},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"tenants": {},
"metadata": {}
}
Realtime:
{
"tableName": "events3",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "ts",
"schemaName": "events3",
"replication": "1",
"replicasPerPartition": "1",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "10m"
},
"task": {
"taskTypeConfigsMap": {
"RealtimeToOfflineSegmentsTask": {
"bufferTimePeriod": "1m",
"bucketTimePeriod": "5m",
"roundBucketTimePeriod": "1m",
"schedule": "0 * * * * ?",
"mergeType": "rollup",
"count.aggregationType": "max",
"maxNumRecordsPerSegment": "1000"
}
}
},
"tableIndexConfig": {
"invertedIndexColumns": [],
"noDictionaryColumns": [],
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "pinot_test",
"stream.kafka.broker.list": "SERVER-LIST",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "largest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "1h",
"realtime.segment.flush.segment.size": "1M"
},
"createInvertedIndexDuringSegmentGeneration": false,
"rangeIndexColumns": [],
"rangeIndexVersion": 2,
"autoGeneratedInvertedIndex": false,
"sortedColumn": [],
"bloomFilterColumns": [],
"loadMode": "MMAP",
"onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [],
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false
},
"tenants": {},
"metadata": {}
}
vishal
11/08/2022, 8:38 AM