Hello every one, I have doubt about the Storage an...
# general
s
Hello every one, I have doubt about the Storage and query part of the pinot. Suppose if we hav 6 months of data as pinot segments in deep storage (size of 500gb) and if i want to make the aggregate query on last 6th months data. 1. does my offline data server should have 500gb memory(RAM) to processs the query ?? or Even with 100gb ram and storage of 500gb, the queries will work efficiently ?? 2. Also does my query work, if i didnt have the storage of 500gb ? 3. memory required for loading segment file from disk is same as the size of the file ? i meant, because of loading the compressed file to memory will blow up the ram 3-4x. Also If want to read the single record from previous 6 months, will it do on demand segment loading from deep storage ??
m
Servers maintain a local copy of segments on their disk. You do need local disk big enough to store the per server data, but segments are memory mapped, so you don’t need a big RAM, you can do away with 64GB ram for example.
s
understood. if my query is a groupBY query on 6months dataset. does the lesser memory than the size of whole dataset to process the query works ?
m
Yes. You don't need large ram to match the data size.
s
Ok. thanks. just to understand the performance of pinot, all the benchmarks of pinot shown in various places are used memory mapped mode segments from disk or HEAP mode ?
m
MMAP mode only
👍 1
x
hi @User, if i understood you right, if the query requires 500GB of segments to be mmap-ed in, 64GB of ram is enough?
the index is built and stored on disk as it is loaded into the server?
m
Yes that is correct.
x
in that case, how does one decide to size servers? is no. of cpu the more important metric than memory, since segments can be swapped in (is this a significant cost if you have to swap in many segments?)
m
Depends on data size, query, read/write qps, query selectivity and latency requirements and cost constraints