Hi. We have a K8s Pinot deployment and some of our queries are taking > 10s. We found one conspicuous correlation during our investigation - Latency spikes happen when there is also a spike a YG GC count. In the following charts, spikes happened across the board at 15:28. Does this indicate a possible GC issue?
m
Mayank
07/06/2021, 3:22 PM
Need more info. Is this server side? What’s the read qps, and data size on server? What’s the heap size? What kind of queries
Mayank
07/06/2021, 3:22 PM
What version of Java
p
Prashant Pandey
07/07/2021, 8:14 AM
@Mayank yes this is server side. The QPM never crossed 100. Java opts: -Xms6G -Xmx10G -XX:+UseG1GC -XX:MaxGCPauseMillis=200.
Pinot image: apachepinot/pinot:0.7.1
Total container memory: 85G
Prashant Pandey
07/07/2021, 8:16 AM
You see for rawServiceView, the table query latency is > 10s.
Prashant Pandey
07/07/2021, 8:17 AM
This is the heap usage pattern:
Prashant Pandey
07/07/2021, 8:32 AM
Where should we look first to find out why latency is spiking intermittently. Resources don’t seem to be an issue as the load is very less right now and we have allocated ample.
Prashant Pandey
07/07/2021, 9:11 AM
Query: Select api_id, service_name, service_id, api_name, COUNT(*) FROM rawServiceView WHERE tenant_id = ? AND ( api_id != ? AND start_time_millis >= ? AND start_time_millis < ? ) GROUP BY api_id, service_name, service_id, api_name ORDER BY PERCENTILETDIGEST99(duration_millis) desc limit 10000
m
Mayank
07/07/2021, 1:59 PM
Heap is used mostly for query processing and some past of real-time (segment generation). If this is offline server check what was happening in the logs during GC pause. If real-time then also check if segment generation happening? Also look at numDocs scanned in filter etc to see how much data was being processed on heap. You also have percentile in group by, check it’s size