https://pinot.apache.org/ logo
#general
Title
# general
s

Shounak Kulkarni

04/07/2020, 6:46 AM
Hey all, I have deployed pinot in kubernetes. When I start ingesting data in kafka, the server RAM keeps on piling up even after segment creation and eventually pod crashes due to OOM.  I am keeping the segment threshold size as 1000000 and ingesting about 10 million entries in kafka, memory limit is 4GB for server. Why is the memory not released? Thanks!
x

Xiang Fu

04/07/2020, 7:11 AM
what’s your jvm setting ?
usually we recommand to give the container fixed memory limit (like init 4g, max 4g)
then for jvm, we only reserve certain for heap data, e.g. 2G
the rest will be used for off-heap memory map
s

Shounak Kulkarni

04/07/2020, 7:15 AM
i have allocated init and max as 4g to jvm and also used loadmode as mmap
x

Xiang Fu

04/07/2020, 7:15 AM
what’s your k8s container size?
could you make it something like 16g, for both init and max
s

Shounak Kulkarni

04/07/2020, 7:16 AM
so the initial footprint of the server pod is 4 millicores and around 1.5 G memory
x

Xiang Fu

04/07/2020, 7:17 AM
right, but if it exceeds the limit, then the pod will be killed
do you set container memory limit?
s

Shounak Kulkarni

04/07/2020, 7:17 AM
yes to 4 G
even i tried without giving limit but the memory was piling up continously
x

Xiang Fu

04/07/2020, 7:18 AM
in this case,it could because of off-heap memory allocation
can you try to set jmx to
-Xms2G -Xmx2G
and container to init/limit to
4G
s

Shounak Kulkarni

04/07/2020, 7:20 AM
yes i'll try it any specific reason?
x

Xiang Fu

04/07/2020, 7:22 AM
because pinot will allocate off-heap memory
if you give 4g to heap
then it will oom once pinot allocate off-heap memory
like direct buffer
heap is mostly used for online consuming segments and query
s

Shounak Kulkarni

04/07/2020, 7:24 AM
oh ok.. @Xiang Fu one more doubt, will the pinot server load all segments to memory for serving query?
x

Xiang Fu

04/07/2020, 7:25 AM
it relies on system paging
so technically if you have very large spare ram to host all data, then it’s possible
otherwise system swapping will happen
s

Shounak Kulkarni

04/07/2020, 7:29 AM
is there any configuration to control the number of segments loaded to heap to server query
x

Xiang Fu

04/07/2020, 7:31 AM
no, there is only table config of
readMode
heap/mmap
heap will try to load all segments to heap
which typically is not the best practise
we mostly reply on system page cache
s

Shounak Kulkarni

04/07/2020, 7:35 AM
Ok...thanks a lot @Xiang Fu for the quick responses and help. I'll try what you asked and get back :)
x

Xiang Fu

04/07/2020, 7:35 AM
np!
s

Shounak Kulkarni

04/07/2020, 8:24 AM
hey, now its not going OOM but still the memory utilization is increasing with each segment creation. Memory went from 975 M to 3892 M for 11 segments of size 1 million each
x

Xiang Fu

04/07/2020, 8:31 AM
could you check segment size?
you and check pinot data directory
the memory you mentioned is jvm usage or k8s container memory usage?
s

Shounak Kulkarni

04/07/2020, 8:32 AM
k8s container memory
in the data directory the columns.psf is of 48.95 MB for first segment
x

Xiang Fu

04/07/2020, 8:35 AM
Then i think it’s expected as pinot uses off-heap memory to create segment and load persisted data
hmm, then there should be only round 500mb on disk
s

Shounak Kulkarni

04/07/2020, 8:37 AM
this columns.psf is only the persisted segment right?
x

Xiang Fu

04/07/2020, 8:38 AM
yes
that’s the data file
s

Shounak Kulkarni

04/07/2020, 8:39 AM
so when segments are purged will the K8s memory get released?
cause if not then it will grow indefinitely
x

Xiang Fu

04/07/2020, 8:41 AM
the buffer will be released once used up
for persisted segments on disk, pinot loads them in memorymap mode
the page swap will happen, so it won’t grow infinitely
s

Shounak Kulkarni

04/07/2020, 8:48 AM
whats exactly is buffer that you mentioned
x

Xiang Fu

04/07/2020, 8:53 AM
so in heap, there are data buffer created to host consuming segment. That buffer will be recycled once previous segment is sealed
during segment creation, there are off-heap buffer created for temp usage and will be released
s

Shounak Kulkarni

04/07/2020, 9:00 AM
ok now I am getting clarity.. May be I am having issue with the part where off-heap buffer is not getting released, as k8s memory is increasing with incoming data even after segment creation is completed
x

Xiang Fu

04/07/2020, 9:06 AM
off-heap part is because of memory map
all those column.psf files are loaded as memory mapped files
it leverages off-heap to cache data so boost query performance
so not every query will hit disk
pinot can map data size larger than off-heap size, and at that time, page fault will happen, the query will hit disk to read data.
if you keep ingesting data to more than 2gb then you should see this
s

Shounak Kulkarni

04/07/2020, 9:18 AM
ok... so in my case page fault won't happen as data is (50 mb segment size)X(10 segments).. but same amount of memory(i.e. 500 mb) should be reflected in the k8s container memory... am I getting it right?
x

Xiang Fu

04/07/2020, 9:48 AM
yes
actually the off-heap size ,which is 2gb
s

Shounak Kulkarni

04/07/2020, 9:52 AM
ok got it! thanks a lot that was really helpful
Hey everything is working fine now. Along with -Xms and -Xmx there's one more option set -XX:MaxDirectMemorySize, its value was set to 10G . Now I reduced it to 2G (Seems [-Xmx] + [ -XX:MaxDirectMemorySize] < Container memory limit). Thank a lot again!
x

Xiang Fu

04/07/2020, 7:03 PM
oh, I don’t think you need to specify
-XX:MaxDirectMemorySize
any more
it will be default to max heap size
👍 1