Our brokers have been running into direct memory a...
# troubleshooting
s
Our brokers have been running into direct memory allocation OOM errors. We have allocated 128M. Noticed that the brokers don't crash but catch the exception and log it. The only symptom we see is query timeouts. Would like to understand: a) what is the direct memory used for ? b) any guidelines to size it ?
k
its used by netty, 128M is too less if you are moving a lot of data between server and broker
increase it to 1G
s
thanks
l
I got some time to do more analysis on this today. Found the root cause. Posting my analysis here as several others may hit into the same problem in k8s environment where you try to control the total memory of a pod. Background (Direct buffers and their garbage collection) • P*inot Broker* has no explicit requirement (Like off-heap cache) for huge Direct memory. In Broker, direct memory is used only in netty layer (socket nio) • Direct buffers are collected generally on Full GC. When it reaches the max limit, JVM triggers a full GC even if heap is not full • To avoid this full GC, different JVM vendors has different proprietary and vendor-specific mechanisms to release/collect the direct buffers. • Several open source projects tend to implement (including hadoop, netty and Pinot) hacks to use these internal implementations (Example: sun.misc.Unsafe) to clean up the direct buffers. (io.netty.buffer.PooledByteBufAllocator, io.netty.util.internal.CleanerJava9, org.apache.pinot.core.util.CleanerUtil) Root cause Direct buffers not getting collected as JVM flag DisableExplicitGC is set. • When direct memory is full, JVM triggers a full gc System.gc() • On other hand, we disabled explicit gc via JVM flags (-XX:+DisableExplicitGC) • Which means System.gc() is ignored completely and equivalent to a NOOP • Throws OOM as there is no free direct memory. cc: @Kishore G @Suraj
👀 1
thankyou 1