We have a system which stores transactional data i...
# general
a
We have a system which stores transactional data in MongoDB, where updates/deletes are pretty common. To build an analytical reporting system on this data, we were considering to leverage Pinot, by streaming data into it from Mongo. While I see Pinot supports streaming ingestion with "upserts", I had the following questions: 1. The doc mentions at a few places that Pinot is designed for "immutable" data, which kind of contradicts with the upsert feature. How do these two concepts hold together? 2.
Upsert table maintains an in-memory map from the primary key to the record location
- The "record location" could be either in-memory or in segment store, so does this map maintain both kinds of locations? By storing all primary keys, will this map keep growing indefinitely in memory and will require vertical scaling of servers at some point? 3. If a record in a segment is updated, all servers need to reload it, I guess. Does it make updates expensive? 4. Overall, is our use-case well suited for Pinot (where data updates/deletes of a record are pretty common)?
k
Hi Abhishek, Thanks a lot for showing interest in Pinot. Let me answer you queries - • The segments remain immutable even with upsert enabled. We simply change the metadata to keep track in which segment is the latest record present for a key • Yes, it maintains both kinds of location and yes, it would require vertical scaling. The off-heap metadata store is WIP. • No, since the segments are immutable and hence never actually updated, everything works the same way with or without upserts. • Yes, we do have some users already leveraging pinot for such high upserts use cases. You can also check out
Partial Upserts
a
Got it, thanks a lot!