https://pinot.apache.org/ logo
#general
Title
# general
j

Jagannath Timma

06/14/2022, 8:18 PM
Hello guys, I am looking at pinot upsert/dedup documentation. From what I understand, when upsert is enabled (lets say PK is a string and latest ts col is used determine order), at query time the latest row is returned by pinot. But all the older rows are still stored by Pinot. Is that correct? Also, what is the difference between upsert and dedup? Is it that dedup will actually discard the older row data when a PK conflict is detected?
n

Neha Pawar

06/14/2022, 9:51 PM
correct about upsert. for dedup, the newer row will be discarded, not the older one
j

Jagannath Timma

06/15/2022, 5:21 AM
yes makes sense. @Neha Pawar Does upsert return the newest row across realtime and offline segments?
n

Neha Pawar

06/15/2022, 2:25 PM
Upsert is only applicable to realtime tables. So it will return the newest row across the consuming and completed segments, within a single partition
j

Jagannath Timma

06/15/2022, 7:26 PM
Thanks @Neha Pawar Thats perfect. I am gonna set up a table with upsert and dedup both enabled. So within a realtime segment, rows get deduped. And at query time, newest from across consuming and completed segments get returned. Does that make sense?
n

Neha Pawar

06/15/2022, 7:33 PM
sounds like you could just use upsert? why do you need dedup? • dedup also works across consuming + completed segments. so if you’ve enabled dedup, you will not really have more than 1 record for that primary key • as of this time, dedup and upsert cannot be configured together. but some work will be done to change that (cant say when)
j

Jaganath Timma

06/15/2022, 7:42 PM
Ah so yes I only need dedup
I actually need just one row