I'm curious about how segments are organized inter...
# general
r
I'm curious about how segments are organized internally. From what I've heard, a segment is stored as a single binary file that includes all of the columns and indices for a particular table. I've also heard that StarTree indices are stored as separate binary files (1 file per StarTree Index) but are still considered to be part of the segment. Does anyone have any additional information or resources on how the binary file is organized or used?
I can also add that the column data is dictionary encoded.
And, IIRC, some of the segment data is mmaped (ie memory mapped)
m
Yes, what you described is pretty much how it is. All immutable data is mmaped.
🤔 1
r
Are there mutable portions of the segment that aren't mmaped? If so, what are in these mutuable portions?
m
Actually, let me give the simple answer first. You can consider all segments are mmapped.
By mutable/immutable I actually meant segment that is sealed/committed, vs segment that is open and ingesting events in real-time. Not to be confused with ‘upsert’.
r
When a server is ingesting into a consuming segment it's actually serializing the event into binary. And via mmap, the binary data is written into memory and eventually to disk?
m
Yes, events are indexed as they are ingested. The indexes are kept in memory for some time (via off-heap memory like direct memory or mmap). Then periodically they are sealed and committed to disk
r
In other words, all of the columns are appended and the dictionaries/indexes (include StarTree indexes) are updated in memory when an event is consumed?
m
Yes, except StarTree index is created only while committing to disk.
🤔 1
r
When segments is bounded by time (ie sealed/committed at periodic intervals), is it possible to run into OOM issues when there's a burst in event ingestion?
Last question for the night. Does a StarTree index represent the latest data or does it lag behind the latest consumed events?
m
Startree index is created only when committing the segments, so yes.
r
Thanks for all answers Mayank!
👍 1