Alex
11/16/2019, 9:07 AMKishore G
Kishore G
Elon
11/16/2019, 7:06 PMAlex
11/17/2019, 2:30 AMAlex
11/18/2019, 4:59 PMAlex
11/18/2019, 5:00 PMAlex
11/18/2019, 5:00 PMzookeeper@pinot-zookeeper-0:/data/log/version-2$ ls -lh
total 8.9G
-rw-rw-r-- 1 zookeeper zookeeper 65M Nov 16 01:37 log.100000001
-rw-rw-r-- 1 zookeeper zookeeper 3.1G Nov 16 19:10 log.200000001
-rw-rw-r-- 1 zookeeper zookeeper 1.8G Nov 16 23:02 log.20000df77
-rw-r--r-- 1 zookeeper zookeeper 4.1G Nov 17 21:21 log.300000001
Kishore G
Alex
11/18/2019, 5:26 PM{
"tableName": "flattened_orders_hours",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "updatedAtHours",
"timeType": "HOURS",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "365",
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "flattened_orders_hours",
"replication": "1",
"replicasPerPartition": "1"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP",
"invertedIndexColumns": [
"...",
"...",
"...",
"..."
],
"aggregateMetrics": "true",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "highLevel",
"stream.kafka.topic.name": "flattened-orders-json-seconds",
"stream.kafka.decoder.class.name": "org.apache.pinot.core.realtime.impl.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.core.realtime.impl.kafka2.KafkaConsumerFactory",
"stream.kafka.hlc.zk.connect.string": "IP:2181/",
"stream.kafka.zk.broker.url": "IP:2181/",
"stream.kafka.broker.list": "IP:9092",
"stream.kafka.isolation.level": "read_committed",
"stream.kafka.hlc.bootstrap.server": "IP:9092",
"realtime.segment.flush.threshold.time": "3600000",
"realtime.segment.flush.threshold.size": "50000",
"stream.kafka.consumer.prop.auto.offset.reset": "earliest",
"stream.kafka.consumer.prop.group.id": "pinot-flattened_orders_hours"
}
},
"metadata": {
"customConfigs": {}
}
}
Alex
11/18/2019, 5:26 PMAlex
11/18/2019, 5:27 PMKishore G
flush threshold size is 50000, its too low.
Kishore G
Alex
11/18/2019, 5:30 PM50000
what should it be?Kishore G
Kishore G
Neha Pawar
Alex
11/18/2019, 5:38 PMAlex
11/18/2019, 5:38 PM4.0K ./consumers
3.7M ./flattened_orders_hours_REALTIME_1573926928266_0__0__1573996522185/v3
3.7M ./flattened_orders_hours_REALTIME_1573926928266_0__0__1573996522185
3.6M ./flattened_orders_hours_REALTIME_1573926928266_0__0__1573939395196/v3
3.6M ./flattened_orders_hours_REALTIME_1573926928266_0__0__1573939395196
3.7M ./flattened_orders_hours_REALTIME_1573926928266_0__0__1573993404560/v3
3.7M ./flattened_orders_hours_REALTIME_1573926928266_0__0__1573993404560
Neha Pawar
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.threshold.size": "0",
This should enable segment size based threshold. By default, the algorithm tried to create segments of 200M.Kishore G
Subbu Subramaniam
11/18/2019, 5:54 PMAlex
11/18/2019, 6:08 PMEvents with higher offsets should be more recent (the offsets of events need not be contiguous)
Kishore G
Alex
11/18/2019, 6:16 PMKishore G
Subbu Subramaniam
11/18/2019, 6:30 PMAlex
11/18/2019, 6:31 PMAlex
11/18/2019, 6:32 PMEvents with higher offsets should be more recent