https://pinot.apache.org/ logo
#general
Title
# general
s

Sávio Salvarino Teles de Oliveira

06/03/2021, 3:16 PM
Hello. What happens when upsert during the real-time ingestion with primary key and event time equals? The documentation says: "When two records of the same primary key are ingested, the record with the greater event time (as defined by the time column) is used.". But when there is a tie, what happens?
k

Kishore G

06/03/2021, 3:24 PM
@User @User will know the exact answer. As part of partial upsert work we are doing, we will make the merging logic pluggable/configurable
y

Yupeng Fu

06/03/2021, 3:30 PM
Currently the behavior is undefined, so it’s implementation based which is the message that has largest offset. However, there are caveats such as for the case where the records are sorted by some column, the order is not determined
m

Mayank

06/03/2021, 3:32 PM
@User mind adding to docs/FAQ?
s

Sávio Salvarino Teles de Oliveira

06/03/2021, 3:41 PM
I have an upsert scenario that I am not getting my head around. We have created an order table (configuration in the file below) with primary_key being "kid" and the time column being "_order_date". We have duplicated rows (same "kid" and "_order_date") in our table. But, upsert is keeping duplicates at the end. Depending on the filter of the query it returns duplicate items to me. See the screenshots below.
j

Jackie

06/03/2021, 4:00 PM
I don’t see upsert is enabled in the table config. Please follow the instructions here to enable the upsert: https://docs.pinot.apache.org/basics/data-import/upsert
Please make sure the Kafka stream is partitioned on the primary key
s

Sávio Salvarino Teles de Oliveira

06/03/2021, 4:04 PM
Sorry @User. Here is the updated file! The one I had sent was from the previous version. The error happens with the table created with this schema below.
j

Jackie

06/03/2021, 4:08 PM
Is the Kafka stream partitioned with the primary key?
Can you try this query: select kid, $hostName from orders2 where kid = ‘ever max-100000009’
y

Yupeng Fu

06/03/2021, 4:12 PM
@User sure thing
thankyou 1
s

Sávio Salvarino Teles de Oliveira

06/03/2021, 4:21 PM
@User. The results to the query are in the image below. I don't know if the Kafka stream is partitioned with the primary key. I will check here and get back to you. Tks
j

Jackie

06/03/2021, 4:35 PM
@User The kafka stream is not properly partitioned, and the same
kid
shows up in 2 different partitions
s

Sávio Salvarino Teles de Oliveira

06/03/2021, 5:28 PM
Tks, @User.