Hi All My server is crashing every time due of OOM My server Apache Pinot #general

Hi All, My server is crashing every time due of OO...

kauts shukla

05/18/2021, 10:09 AM

Hi All, My server is crashing every time due of OOM, My servers are r5.4xlarge and heap space i have provided is 110GB. I have 2 servers.

Daniel Lavoie

05/18/2021, 11:37 AM

Reduce your heap size. Server requires at least 50% of head room for non heap memory

kauts shukla

05/18/2021, 11:38 AM

okz

kauts shukla

05/18/2021, 11:38 AM

@User thanks

Mayank

05/18/2021, 1:05 PM

It will also help to get the stack trace. If you are using MMAP mode (recommended), then you should not run into OOM.

Mayank

05/18/2021, 1:24 PM

@User ^^. The one place server uses heap is query execution, but there also it sill try to bail out (unless you increased max num groups), to avoid OOM. So I am really curious about the stack trace.

Mayank

05/18/2021, 1:28 PM

@User let’s continue here

Mayank

05/18/2021, 1:28 PM

Seems you have 3500 segments on single server

kauts shukla

05/18/2021, 1:28 PM

Yes

kauts shukla

05/18/2021, 1:29 PM

Each server with 3500 segments and I have 2 servers

kauts shukla

05/18/2021, 1:29 PM

Total 7000 segments

Mayank

05/18/2021, 1:30 PM

The metadata is also stored in memory, but I can’t see how it uses several GB. Would it be possible for you to share the stack

kauts shukla

05/18/2021, 1:30 PM

I’ll check as I’m out of system as of now

Mayank

05/18/2021, 1:31 PM

And this is as soon as you bring the server up?

kauts shukla

05/18/2021, 1:31 PM

No after sometime almost loading half 1700 segments in each server abd than dient

kauts shukla

05/18/2021, 1:32 PM

Dieing

Mayank

05/18/2021, 1:32 PM

These are real-time segments that are already written to disk?

Mayank

05/18/2021, 1:35 PM

I think real-time nodes allocate direct buffer for consuming segments. If you allocated entire memory for heap, then it could run out of direct memory. Unless your queries are doing heavy computation, you should use limited amount of heap (we typically use 16GB for our heavy production loads).

kauts shukla

05/18/2021, 1:37 PM

@User I have also did the same bring down the Heap and its running fine from last 2 hours

kauts shukla

05/18/2021, 1:37 PM

@User I was just cehcking what to give as heap size

kauts shukla

05/18/2021, 1:38 PM

beacuse on max it was coming down

Mayank

05/18/2021, 1:42 PM

What’s your read qps for queries? And are queries going to process too much data in memory? We have used just 16GB for most production use case and it has worked fine

kauts shukla

05/18/2021, 1:42 PM

as of now query load is not there…

Mayank

05/18/2021, 4:08 PM

Ok assuming you will have decent query load that requires processing large amounts of data in memory (say 100's of thousands of groups etc), you can still do away with 16GB heap.

kauts shukla

05/18/2021, 4:50 PM

@User ok

kauts shukla

05/22/2021, 2:54 PM

@User: I have kept JVM 16GB and table [“loadMode”: “MMAP”], my server keep getting lost post this error

kauts shukla

05/22/2021, 2:55 PM

Screenshot 2021-05-22 at 8.24.14 PM.png

Mayank

05/22/2021, 2:58 PM

Seems it is getting disconnected from ZK. Do you have any GC logs

Mayank

05/22/2021, 3:01 PM

Side question how many partitions in Kafka and how many per Pinot server

kauts shukla

05/22/2021, 3:06 PM

I have 2 servers and 3 zk nodes

kauts shukla

05/22/2021, 3:09 PM

@User:

Copy code

May 22, 2021 3:08:36 PM org.glassfish.grizzly.nio.SelectorRunner doSelect
SEVERE: doSelect exception
java.lang.OutOfMemoryError: Java heap space

Mayank

05/22/2021, 3:11 PM

Hmm, any query execution happen? Or just consumption

Mayank

05/22/2021, 3:11 PM

Can you also paste your jvm settings

kauts shukla

05/22/2021, 3:11 PM

Just consumption, No queries get fired

kauts shukla

05/22/2021, 3:12 PM

export JAVA_OPTS=“-javaagent/home/ubuntu/apache pinot incubating 0.7.1 bin/plugins/jmx prometheus javaagent 0.12.0.jar=8080/home/ubuntu/apache-pinot-incubating-0.7.1-bin/conf/pinot.yml -Xms16G -Xmx16G -XX:+UseG1GC”

Mayank

05/22/2021, 3:12 PM

Then the heap is used only for storing metadata.

kauts shukla

05/22/2021, 3:12 PM

@User

Mayank

05/22/2021, 3:13 PM

Any chance to take heap dump?

Daniel Lavoie

05/22/2021, 3:13 PM

16GB might not be enough even if MMAP is default

Mayank

05/22/2021, 3:14 PM

Why is that @User

Mayank

05/22/2021, 3:14 PM

There is no query execution, so what is occupying the heap?

kauts shukla

05/22/2021, 3:15 PM

@User I have tried expanding in multiple of 16 till 64GB heap space but same issue is happening. Table source is kafka and partitions are 50 and lag is 10CR as of now.

Mayank

05/22/2021, 3:15 PM

Are Prometheus and Pinot sharing same jvm?

kauts shukla

05/22/2021, 3:15 PM

@User No

Mayank

05/22/2021, 3:16 PM

Yeah something is wrong here if you get heap OOM with 64 GB as well.

Mayank

05/22/2021, 3:16 PM

You can’t take a heap dump?

Mayank

05/22/2021, 3:17 PM

Can we move to #C011C9JHN7R ?

kauts shukla

05/22/2021, 3:19 PM

@User sure

Open in Slack

Previous Next