Alice
05/11/2022, 8:27 AMKartik Khare
05/11/2022, 10:46 AMKartik Khare
05/11/2022, 10:47 AMAlice
05/11/2022, 11:57 AMKartik Khare
05/11/2022, 12:02 PMskipUpsert
option in query, do you get both old data + updated data or just old data?Alice
05/11/2022, 12:07 PMAlice
05/11/2022, 12:08 PMJackie
05/11/2022, 4:58 PMAlice
05/12/2022, 12:12 AMAlice
05/12/2022, 12:17 AMAlice
05/12/2022, 12:22 AMKartik Khare
05/12/2022, 10:26 AMNo records(s) Found
error does occur in both. Need to check why is that the case.
The created_on
as 0 is not reproducible so far. It shows correct values in all tried out permutations.Kartik Khare
05/12/2022, 12:00 PMcomparisonColumn
is used with a partial updated column set to IGNORE
. If the column is set to OVERWRITE, it works as expected. If comparisonColumn
is not specified, timeColumn
is used for comparison which again causes the same issue as in this case.Kartik Khare
05/12/2022, 2:01 PMWe store primaryKey to recordLocation mapping in memory. The recordLocation contains the comparison value as well.
The following happens
Record A arrives with Key "key" and created_on as 100, docId 0
Record B arrives with Key "key" and created_on as 200, docId 1
The final state of primaryKey store is "key" -> RecordLocation("comparable" -> 200, docId -> 1)
However, the actual record stored in segment is ("key", docId -> 1, created_on -> 100) (since created_on is set to IGNORE)
When consuming segment is getting committed, 'addSegment' gets called which iterates upong all the previous records and creates a `validDocId` list
For our case, it checks `created_on` in segment's stored record (100) is less than the one in primaryKey map (200). So it simply ignores this record and doesn't add it to `validDocId` list.
Hence, you don't get this record at query time once the segments commits.
Kartik Khare
05/12/2022, 2:02 PMrecordInfo
is created before updatedRecord
where we apply partial update. Should we be updating this object after updateRecord
is called? Small changed but not sure what it may break.
PartitionUpsertMetadataManager.RecordInfo recordInfo = getRecordInfo(row, numDocsIndexed);
GenericRow updatedRow = _partitionUpsertMetadataManager.updateRecord(row, recordInfo);
updateDictionary(updatedRow);
addNewRow(numDocsIndexed, updatedRow);
// Update number of documents indexed before handling the upsert metadata so that the record becomes queryable
// once validated
canTakeMore = numDocsIndexed++ < _capacity;
_partitionUpsertMetadataManager.addRecord(this, recordInfo);
Kartik Khare
05/12/2022, 2:50 PMJackie
05/12/2022, 4:54 PMcomparisonColumn
. We use comparisonColumn
to determine which record to keep, and allow modifying it can break the contractKartik Khare
05/12/2022, 4:58 PMJackie
05/12/2022, 5:03 PMTableConfigUtils.validateUpsertConfig()
to reject table config with upsert strategy on primary key columns or comparison column. In PartialUpsertHandler
, we may log warning if we find strategy configured for these columns, and don't add them to the mergers in order to fix the tables already existKartik Khare
05/12/2022, 5:04 PM