Alice
05/11/2022, 8:27 AMKartik Khare
05/11/2022, 10:46 AMKartik Khare
05/11/2022, 10:47 AMAlice
05/11/2022, 11:57 AMKartik Khare
05/11/2022, 12:02 PMskipUpsert option in query, do you get both old data + updated data or just old data?Alice
05/11/2022, 12:07 PMAlice
05/11/2022, 12:08 PMJackie
05/11/2022, 4:58 PMAlice
05/12/2022, 12:12 AMAlice
05/12/2022, 12:17 AMAlice
05/12/2022, 12:22 AMKartik Khare
05/12/2022, 10:26 AMNo records(s) Found error does occur in both. Need to check why is that the case.
The created_on as 0 is not reproducible so far. It shows correct values in all tried out permutations.Kartik Khare
05/12/2022, 12:00 PMcomparisonColumn is used with a partial updated column set to IGNORE. If the column is set to OVERWRITE, it works as expected. If comparisonColumn is not specified, timeColumn is used for comparison which again causes the same issue as in this case.Kartik Khare
05/12/2022, 2:01 PMWe store primaryKey to recordLocation mapping in memory. The recordLocation contains the comparison value as well.
The following happens
Record A arrives with Key "key" and created_on as 100, docId 0
Record B arrives with Key "key" and created_on as 200, docId 1
The final state of primaryKey store is "key" -> RecordLocation("comparable" -> 200, docId -> 1)
However, the actual record stored in segment is ("key", docId -> 1, created_on -> 100) (since created_on is set to IGNORE)
When consuming segment is getting committed, 'addSegment' gets called which iterates upong all the previous records and creates a `validDocId` list
For our case, it checks `created_on` in segment's stored record (100) is less than the one in primaryKey map (200). So it simply ignores this record and doesn't add it to `validDocId` list.
Hence, you don't get this record at query time once the segments commits.Kartik Khare
05/12/2022, 2:02 PMrecordInfo is created before updatedRecord where we apply partial update. Should we be updating this object after updateRecord is called? Small changed but not sure what it may break.
PartitionUpsertMetadataManager.RecordInfo recordInfo = getRecordInfo(row, numDocsIndexed);
GenericRow updatedRow = _partitionUpsertMetadataManager.updateRecord(row, recordInfo);
updateDictionary(updatedRow);
addNewRow(numDocsIndexed, updatedRow);
// Update number of documents indexed before handling the upsert metadata so that the record becomes queryable
// once validated
canTakeMore = numDocsIndexed++ < _capacity;
_partitionUpsertMetadataManager.addRecord(this, recordInfo);Kartik Khare
05/12/2022, 2:50 PMJackie
05/12/2022, 4:54 PMcomparisonColumn. We use comparisonColumn to determine which record to keep, and allow modifying it can break the contractKartik Khare
05/12/2022, 4:58 PMJackie
05/12/2022, 5:03 PMTableConfigUtils.validateUpsertConfig() to reject table config with upsert strategy on primary key columns or comparison column. In PartialUpsertHandler, we may log warning if we find strategy configured for these columns, and don't add them to the mergers in order to fix the tables already existKartik Khare
05/12/2022, 5:04 PM