Hi. We have a K8s Pinot deployment and some of our queries are taking > 10s. We found one conspicuous correlation during our investigation - Latency spikes happen when there is also a spike a YG GC count. In the following charts, spikes happened across the board at 15:28. Does this indicate a possible GC issue?
07/06/2021, 3:22 PM
Need more info. Is this server side? What’s the read qps, and data size on server? What’s the heap size? What kind of queries
What version of Java
07/07/2021, 8:14 AM
@Mayank yes this is server side. The QPM never crossed 100. Java opts: -Xms6G -Xmx10G -XX:+UseG1GC -XX:MaxGCPauseMillis=200.
Pinot image: apachepinot/pinot:0.7.1
Total container memory: 85G
You see for rawServiceView, the table query latency is > 10s.
This is the heap usage pattern:
Where should we look first to find out why latency is spiking intermittently. Resources don’t seem to be an issue as the load is very less right now and we have allocated ample.
Query: Select api_id, service_name, service_id, api_name, COUNT(*) FROM rawServiceView WHERE tenant_id = ? AND ( api_id != ? AND start_time_millis >= ? AND start_time_millis < ? ) GROUP BY api_id, service_name, service_id, api_name ORDER BY PERCENTILETDIGEST99(duration_millis) desc limit 10000
07/07/2021, 1:59 PM
Heap is used mostly for query processing and some past of real-time (segment generation). If this is offline server check what was happening in the logs during GC pause. If real-time then also check if segment generation happening? Also look at numDocs scanned in filter etc to see how much data was being processed on heap. You also have percentile in group by, check it’s size