Hi Folks, We are exploring upsert feature in Pino...
# general
l
Hi Folks, We are exploring upsert feature in Pinot. Have few questions around this. Please help me to understand the feature. 1. We are using managed offline flow with 2 days as the buffer time which means they get converted to OFFLINE segments after 2 days. However our REALTIME segments rollup at every 1 hour/partition.  Does the upsert can handle any update within this 2 days time period? 2. How is this handled in managed offline flow. Does these multiple update records for same row gets merged to single row? 3. I'm going through the related design documents available here. But for one document access is closed. Can you please provide access.
m
Copy code
1. Upsert will work only on the RT components. So if your data is being served from RT for 2 days, that is the part that will have upsert.

2. Upsert does not work for offline component.

3. @Yupeng Fu for doc access
l
Thanks @User for the response. Can you please clarify this? What happens to these records during OFFLINE conversion. Say, we got 3 records for a row (1 original and 2 partial updates). All 3 won't get merged into single record during OFFLINE conversion? If no, then which record stays.
cc: @User for design document access
y
hey, due to company policy, that doc cannot have anonymous access. if you want to read, plz request the gdoc access
m
Hey, @User what I meant was that if you have a record that was already moved to offline, then if an upsert comes in for it in realtime, then it will not be considered as an upsert, but new row for RT table.
l
I'm clear of that case. Once the record is moved to OFFLINE there will be no update. But I'm trying to understand what happens to the updates that came before OFFLINE conversion.
Offline table support
We don’t intend to include the offline table in this work. The offline table is created by the periodical external jobs which typically have the capability to compact the updates into the single record. However, it will impact the work of Pinot managed Offline flows to include such compaction logic.
@User: Can you shed some light on this one please
y
short answer is OFFLINE table is not supported in upsert
you can use offline table, but you shall expect duplicates