Hi All, My server is crashing every time due of OO...
# general
k
Hi All, My server is crashing every time due of OOM, My servers are r5.4xlarge and heap space i have provided is 110GB. I have 2 servers.
d
Reduce your heap size. Server requires at least 50% of head room for non heap memory
k
okz
@User thanks
m
It will also help to get the stack trace. If you are using MMAP mode (recommended), then you should not run into OOM.
@User ^^. The one place server uses heap is query execution, but there also it sill try to bail out (unless you increased max num groups), to avoid OOM. So I am really curious about the stack trace.
@User let’s continue here
Seems you have 3500 segments on single server
k
Yes
Each server with 3500 segments and I have 2 servers
Total 7000 segments
m
The metadata is also stored in memory, but I can’t see how it uses several GB. Would it be possible for you to share the stack
k
I’ll check as I’m out of system as of now
m
And this is as soon as you bring the server up?
k
No after sometime almost loading half 1700 segments in each server abd than dient
Dieing
m
These are real-time segments that are already written to disk?
I think real-time nodes allocate direct buffer for consuming segments. If you allocated entire memory for heap, then it could run out of direct memory. Unless your queries are doing heavy computation, you should use limited amount of heap (we typically use 16GB for our heavy production loads).
k
@User I have also did the same bring down the Heap and its running fine from last 2 hours
@User I was just cehcking what to give as heap size
beacuse on max it was coming down
m
What’s your read qps for queries? And are queries going to process too much data in memory? We have used just 16GB for most production use case and it has worked fine
k
as of now query load is not there…
m
Ok assuming you will have decent query load that requires processing large amounts of data in memory (say 100's of thousands of groups etc), you can still do away with 16GB heap.
k
@User ok
@User: I have kept JVM 16GB and table [“loadMode”: “MMAP”], my server keep getting lost post this error
Screenshot 2021-05-22 at 8.24.14 PM.png
m
Seems it is getting disconnected from ZK. Do you have any GC logs
Side question how many partitions in Kafka and how many per Pinot server
k
I have 2 servers and 3 zk nodes
@User:
Copy code
May 22, 2021 3:08:36 PM org.glassfish.grizzly.nio.SelectorRunner doSelect
SEVERE: doSelect exception
java.lang.OutOfMemoryError: Java heap space
m
Hmm, any query execution happen? Or just consumption
Can you also paste your jvm settings
k
Just consumption, No queries get fired
export JAVA_OPTS=“-javaagent/home/ubuntu/apache pinot incubating 0.7.1 bin/plugins/jmx prometheus javaagent 0.12.0.jar=8080/home/ubuntu/apache-pinot-incubating-0.7.1-bin/conf/pinot.yml -Xms16G -Xmx16G -XX:+UseG1GC”
m
Then the heap is used only for storing metadata.
k
@User
m
Any chance to take heap dump?
d
16GB might not be enough even if MMAP is default
m
Why is that @User
There is no query execution, so what is occupying the heap?
k
@User I have tried expanding in multiple of 16 till 64GB heap space but same issue is happening. Table source is kafka and partitions are 50 and lag is 10CR as of now.
m
Are Prometheus and Pinot sharing same jvm?
k
@User No
m
Yeah something is wrong here if you get heap OOM with 64 GB as well.
You can’t take a heap dump?
Can we move to #C011C9JHN7R ?
k
@User sure