A quick question about realtime table, all data in...
# general
w
A quick question about realtime table, all data inside the in-memory segment (mutable segment) should be in memory even though the pinot is columnar, right? As for the offline segment, only the columns in use are loaded into memory?
s
pinot stores all data in memory (mmap) all the time. for consuming segments it can be lost on restart, that is the only difference
k
just to clarify - mmapped is different from storing data in memory.
d
Apologies for posting to this thread, I've got a question about this topic. When a realtime server goes down that means the data that has not been persisted to immutable segments is temporarily not available? If I have configured the table with the low level consumer I believe the partitions assigned to that server won't be ingested until the server is back up, is this correct? Or is the case that each server consumes all the partitions and when one goes down another server can take over transparently? Are there any metrics or SPI objects that can help me to track which partitions+offset have been permanently persisted? TIA
k
What is the replication factor for the table?
d
Replication factor is 3. I noticed that group.id is set in the stream config. Would this be an issue?
m
@User if replication factor > 1 and one node goes down, the other replicas continue to consume and serve. The bad node catches up when it comes backup. If other nodes committed more segments, it will simply download those and start form latest saved checkpoint.
d
Are there any implications if group.id is set in stream config? Will it keep working the same way?
k
I think its ignored for partition level consumption
w
Sorry to be late for this conversation. If the column is not used, is it loaded to main memory? How about the index, such as startree index? Is it loaded to main memory if they are used at all? @User @User
m
@User For immutable segments, data that is not accessed won’t be loaded in main memory (this is how mmap works).
w
How about the unused startree index? @User
m
Same. The way MMAP works for immutable data is that it is only loaded when needed, and can be flushed out at a later point if not needed