Harish Bohara
06/28/2022, 8:11 AM"routing": {
"instanceSelectorType": "strictReplicaGroup"
},
"query": {},
"upsertConfig": {
"mode": "PARTIAL",
"partialUpsertStrategies": {
"status": "OVERWRITE",
"tenant_name": "OVERWRITE",
"sub_tenant_name": "OVERWRITE"
},
"defaultPartialUpsertStrategy": "OVERWRITE",
"hashFunction": "NONE"
},
Kartik Khare
06/28/2022, 10:27 AMoption(skipUpsert=true)
Harish Bohara
06/28/2022, 1:45 PMHarish Bohara
06/28/2022, 7:25 PMHarish Bohara
06/28/2022, 7:29 PMHarish Bohara
06/28/2022, 7:33 PM{
"schemaName": "schema_v1",
"dimensionFieldSpecs": [
{
"name": "channel",
"dataType": "STRING"
},
{
"name": "pipeline",
"dataType": "STRING"
},
{
"name": "id",
"dataType": "STRING"
},
{
"name": "id_type",
"dataType": "STRING"
},
{
"name": "status",
"dataType": "STRING"
},
{
"name": "lob_name",
"dataType": "STRING"
}
],
"dateTimeFieldSpecs": [
{
"name": "timestamp",
"dataType": "STRING",
"format": "1:HOURS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS",
"granularity": "1:MINUTES"
}
],
"primaryKeyColumns": [
"id",
"id_type"
]
}
{
"tableName": "table_v1",
"tableType": "REALTIME",
"segmentsConfig": {
"schemaName": "schema_v1",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "2",
"replication": "2",
"timeColumnName": "timestamp",
"allowNullTimeValue": true,
"replicasPerPartition": "2"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {}
},
"tableIndexConfig": {
"invertedIndexColumns": [
"pipeline",
"channel"
],
"noDictionaryColumns": [],
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "----MY TOPOOC-----",
"stream.kafka.broker.list": "{{kafka}}",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "largest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "1h",
"realtime.segment.flush.desired.size": "100M",
"realtime.segment.flush.autotune.initialRows": "10000"
},
"sortedColumn": [],
"bloomFilterColumns": [
"channel"
],
"loadMode": "MMAP",
"onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [],
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": true,
"rangeIndexColumns": [],
"rangeIndexVersion": 1,
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false
},
"metadata": {},
"quota": {},
"routing": {
"instanceSelectorType": "strictReplicaGroup"
},
"query": {},
"upsertConfig": {
"mode": "PARTIAL",
"partialUpsertStrategies": {
"status": "OVERWRITE",
"lob_name": "OVERWRITE"
},
"defaultPartialUpsertStrategy": "OVERWRITE",
"hashFunction": "NONE"
},
"ingestionConfig": {},
"isDimTable": false
}
Kartik Khare
06/29/2022, 6:15 AMHarish Bohara
06/29/2022, 6:52 AMHarish Bohara
06/29/2022, 6:56 AMKartik Khare
06/29/2022, 6:58 AM"defaultPartialUpsertStrategy": "IGNORE"
and mention only the columns that need to be updated in partialUpsertStrategies
such as status and lob_name with mode OVERWRITE
Kartik Khare
06/29/2022, 7:00 AMHarish Bohara
06/29/2022, 7:02 AMHarish Bohara
06/29/2022, 7:05 AMHarish Bohara
06/29/2022, 7:34 AMKartik Khare
06/29/2022, 7:36 AMHarish Bohara
06/29/2022, 7:38 AMhist_v3__0__0__20220629T0659Z": {
"Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098": "CONSUMING",
"Server_pinot-server-3.pinot-server-headless.pinot.svc.cluster.local_8098": "CONSUMING"
},
Kartik Khare
06/29/2022, 7:38 AMHarish Bohara
06/29/2022, 7:38 AMHarish Bohara
06/29/2022, 7:41 AMKartik Khare
06/29/2022, 7:46 AMsaurabh dubey
06/29/2022, 7:50 AMHarish Bohara
06/29/2022, 7:52 AMHarish Bohara
06/29/2022, 7:59 AMSame table:
Table v1: crated on 28:
hist_v1__9__0__20220628T0808Z
29330927
Table v1: crated on 28 (11:30 PM) - because V1 was stuck
hist_v2__9__0__20220628T1927Z
29596616
Table v1: crated on 29 ( - because V2 was stuck
hist_v3__9__0__20220629T0659Z
29840247
All of them has:
"segment.realtime.status": "IN_PROGRESS"
Harish Bohara
06/30/2022, 6:39 AMKartik Khare
06/30/2022, 8:27 AMHarish Bohara
06/30/2022, 10:26 AMHarish Bohara
06/30/2022, 10:29 AMKartik Khare
06/30/2022, 12:29 PMHarish Bohara
06/30/2022, 12:41 PMKartik Khare
06/30/2022, 12:45 PMHarish Bohara
06/30/2022, 12:46 PMHarish Bohara
06/30/2022, 12:47 PMHarish Bohara
06/30/2022, 12:54 PMTable V2
1656462921
GMT: Wednesday, June 29, 2022 0:35:21
Your time zone: Wednesday, June 29, 2022 6:05:21 GMT+05:30
Relative: 2 days ago
1656470101579
GMT: Wednesday, June 29, 2022 2:35:01.579
Your time zone: Wednesday, June 29, 2022 8:05:01.579 GMT+05:30
--------------------------------------------------------------------
Table V3 (which i created once V2 table stopped):
1656486065
GMT: Wednesday, June 29, 2022 7:01:05
Your time zone: Wednesday, June 29, 2022 12:31:05 GMT+05:30
1656486423000
GMT: Wednesday, June 29, 2022 7:07:03
Your time zone: Wednesday, June 29, 2022 12:37:03 GMT+05:30
Harish Bohara
06/30/2022, 12:54 PMI do see a issue in my timestamp - some timestamps are "1656462921" and others are "1656470101579" (extra 3 digit)
Harish Bohara
06/30/2022, 1:26 PM