Are their any guidelines / estimates to resource r...
# general
p
Are their any guidelines / estimates to resource requirements for Pinot, particularly memory / local node storage needed? Eg. for 10 Tb table in deep storage need X Tb locally at nodes and Y memory for reasonable performance? Is it designed so that the entire table should be cached locally and MMAP'd or can parts go cold and be pulled on demand from deep storage with the extra latency?
x
usually we recommend r5.2xlarge(8 vcore, 64gb ram) + 2TB SSD EBS, you can also use r5.4xlarge + 4TB
current pinot requires all data to be on local disk for mmap
p
Deep storage is really just DR then?
x
deep storage right now is not on query serving path
👍 1
there is an effort to make the lazy loading for data: https://github.com/apache/incubator-pinot/pull/6250
it’s still ongoing
p
Okay great, that's what I was looking for. Just experimenting now but will assume need local SSD storage equivalent to data set size for now
Thank you
x
right, pinot data segment size typically is smaller than avro (50%-60%) just fyi
almost same as parquet
p
Great, have snappy compressed parquet currently so easy to compare then
g
usually we recommend r5.2xlarge(8 vcore, 64gb ram) + 2TB SSD EBS, you can also use r5.4xlarge + 4TB
Interesting. When you say you are using such server types, is it only for offline / realtime servers ? Or are you running all pinot components on those server types (as you are running docker containers) ?
I will definitely do some testing and sizing with live data, but it would be nice to have some guidelines in the documentation. Something like : • default minimal requirements for pinot server : x vCPU, y GB RAM • additional cpu / RAM per thousand of queries on the broker • additional cpu / RAM per segments on offline server • etc…
it may vary widely between use cases, but at least, it gives some rough estimation on where you should start
x
it’s for pinot-server. For broker and controllers, we use m5.xlarge/m5.2xlarge depends on the use case
g
ok
just for my understanding, you said you are running Docker in production, but you are using r5.2xlarge for pinot servers.
x
k8s
g
ok
so that’s a kubernetes node and you dedicate it to run pinot servers on it ?
(also, it’s really late for you, we can continue that conversation tomorrow morning)
x
Yes, config nodegroup for machine sku and tag them then we can assign the corresponding pods to them