https://pinot.apache.org/ logo
m

Matt

02/23/2021, 6:51 PM
Hello, I have 3 Pinot servers with 4 cores and 48Gi each and using realtime table. I noticed that when the load/flow increases there is a lag in the search results (Inverted Index). Once the load is reduced Pinot will catch up. CPU and MEM usage all looks normal. Wondering why this is happening. Are there any settings to make Pinot servers to process faster?
s

Subbu Subramaniam

02/23/2021, 7:03 PM
@Matt trying to understand your question. Are you saying that when the input stream increases in volume, the query latency increases? That is sort of intuitive because the same CPU is being used for consumption as well as query processing. Increasing the number of cores may help.
m

Matt

02/23/2021, 8:45 PM
@Subbu Subramaniam Actually it is not latency it is the data time lag. In normal scenario the data pushed into kafka will be available almost at the same time in Pinot as well. So Pinot consume the data as nrt from kafka and make it available for search. But when the volume increases Pino takes more time to ingest and it causes the time lag and it is not nrt anymore. I would like Pino to ingest the data as soon as it is available in the stream.
s

Subbu Subramaniam

02/23/2021, 9:43 PM
I can think of a few scenarios, but it will be useful to check what you are bottlenecked by : network, cpu, I/O or memory? I am guessing it is cpu. Maybe we are not able to consume fast enough. We start one consuming thread per partition per replica. How many partitions does your topic have? How many replicas have you configured? Do the total number of cores equal
numPartitions * numReplicas
? If not, then it is likely that the threads that consume some of the partitions are delayed because other threads are not giving up the cores.
m

Matt

02/23/2021, 10:22 PM
Good to know that 1 core is required for per partition per replica. So I assume that could be where the issue is. Let me change my instances and see. Thanks
s

Subbu Subramaniam

02/23/2021, 10:27 PM
Correction. One core is not "required". Just that one core can be kept busy if the pipeline is full.
We consume doing something like this (roughly):
Copy code
while (true) {
  pullMsgsFromKafka();
  if (there are no msgs) {
     sleep a little
  }
}
It is also worthwhile to check if you have a network bottleneck (unlikely, but if you do, it speaks of Pinot's efficiency 🙂
You may also be blocking due to I/O (possibly paging) if you have memory mapped setting for your consuming segments.
m

Matt

02/23/2021, 11:06 PM
Good suggestions, let me check io and disk as well.
@Subbu Subramaniam, I was able to find the issue and fix it. The io1 ssd disk I was using was connected to an instance which was capped at max VolumeWriteBytes of 120KB/s. I changed the instance to a better one and the VolumeWriteBytes spiked to 16G/s. Now there is no lag . Again Thanks for your suggestions.
👍 1
s

Subbu Subramaniam

02/25/2021, 7:44 PM
So, to complete my understanding, this seems to be a case where the consuming memory is acquired using
mmap
in offheap. Since this piece of memory is always being written to, pages are dirty all the time, and so the OS may aggressively start flushing to disk. I would imagine this is what you are experiencing. Running
vmstat
on the box may give you some idea (or, if you have operating system metrics, you can look at pagein pageout metrics) to reconfirm. Alternatively, if you are making segments very frequently then this can happen. You may want to look at https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime#tuning-realtime-performance to set an optimal segment size.
m

Matt

02/25/2021, 8:16 PM
I will check the segment tuning, However the main difference is
Copy code
Old Instance:
Maximum bandwidth (Mbps) = 850	
Maximum throughput (MB/s, 128 KiB I/O) =1 06.25	
Maximum IOPS (16 KiB I/O)=6,000

New Instance:
Maximum bandwidth (Mbps) = 4,750	
Maximum throughput (MB/s, 128 KiB I/O) =593.75	
Maximum IOPS (16 KiB I/O)=18,750
Also I just checked the vmstats , this is with normal load
Copy code
# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 66201480   <tel:374056669552|3740 56669552>    0    0    22   399   32   19  1  0 99  0  0
# free -m
              total        used        free      shared  buff/cache   available
Mem:         127462        7466       64649           1       55346      118860
Swap:             0           0           0