Kishore G
vmarchaud
01/12/2021, 4:33 PMKishore G
vmarchaud
01/12/2021, 5:04 PMsplitting it into multiple segments will result in inconsistency and data duplicationWell i agree on this one thats why i dont get why we have multiple segments
Kishore G
vmarchaud
01/12/2021, 5:14 PMSubbu Subramaniam
01/12/2021, 5:19 PMvmarchaud
01/12/2021, 5:22 PMSubbu Subramaniam
01/12/2021, 5:30 PMvmarchaud
01/12/2021, 5:37 PMKishore G
vmarchaud
01/12/2021, 8:13 PM{
tableName: XXXXXX,
tableType: 'REALTIME',
quota: {},
routing: {},
segmentsConfig: {
schemaName: YYYYY,
timeColumnName: ZZZZZ,
timeType: ZZZZZ,
replication: 1,
replicasPerPartition: 1,
segmentPushType: 'APPEND',
segmentPushFrequency: 'HOURLY'
},
tableIndexConfig: {
streamConfigs: {
'streamType': 'pubsub',
'stream.pubsub.consumer.type': 'highlevel',
'stream.pubsub.decoder.class.name': 'com.reelevant.pinot.plugins.stream.pubsub.PubSubMessageDecoder',
'stream.pubsub.consumer.factory.class.name': 'com.reelevant.pinot.plugins.stream.pubsub.PubSubConsumerFactory',
'stream.pubsub.project.id': XXXXXX,
'stream.pubsub.topic.name': 'unused', // unused but required because the plugin extends the kafka one
'stream.pubsub.subscription.id': ZZZZZ,
'realtime.segment.flush.threshold.time': '15d',
'realtime.segment.flush.threshold.rows': '390000' // 390k rows ~ 200MB (513 bytes / row)
// 'realtime.segment.flush.threshold.segment.size': '200M' this option need `realtime.segment.flush.threshold.rows` to be 0 and doesn't work in 0.6.0 (`Illegal memory allocation 0 for segment ...`)
},
nullHandlingEnabled: true,
invertedIndexColumns: [],
sortedColumn: [],
loadMode: 'mmap'
},
tenants: {},
metadata: {}
}
Depending on the configured number of replicas, multiple stream-level consumers are created, taking care that no two replicas exist on the same server host. Therefore you need to provision exactly as many hosts as the number of replicas configured.