We noticed that offline tables with a lot of segme...
# pinot-perf-tuning
e
We noticed that offline tables with a lot of segments require a lot of DirectR buffer references - would this indicate that we need to scale up the number of servers? What % of the heap should DirectR buffer references consume before it is recommended to scale up?
s
I don't think we have ever made such a specific consideration before adding more capacity. Typically it's the latency and QPS that guide the number of servers (number of replica groups and servers per group) to keep an optimal cpu usage per server. Yes adding more servers will potentially reduce the heap overhead per server. But, % overhead for direct buffers seems very specific thing to optimize. Typically for Java, the way to tune is divide between heap and direct(native) memory. As an example, one of our very high throughput use case has the following config for both offline and realtime
Copy code
<value>-Xms32g</value>
<value>-Xmx32g</value>
<value>-XX:MaxDirectMemorySize=21g</value>
Another case, where the ratio between direct to heap is higher for offline
Copy code
<value>-Xms14g</value>
<value>-Xmx14g</value>
<value>-XX:MaxDirectMemorySize=37g</value>
for realtime, the ratio is low
Copy code
<value>-Xms30g</value>
<value>-Xmx30g</value>
<value>-XX:MaxDirectMemorySize=23g</value>
So my suggestion would be to start with a ratio of direct to heap memory, a set of servers and tune both to arrive at an optimal combination that meets qps and latency sla
k
Hi @Sidd - if a server has lots of RAM (e.g. 256gb) then it seems like there’s some max size for the JVM beyond which it doesn’t benefit for being bigger, but increasing direct memory would help. What’s in JVM space that grows with the size of the dataset?
k
Sidd why do you have such large directmemorysize, that does not make sense
k
Hi @Sidd - our ops team is busy setting up servers, so getting some input on the size of direct memory would be great, thanks!
1