Hi team, I would like to understand if Pinot has certain query caching/warm up mechanism behind the scene? Asking because I noticed that the first run of a query is always the slowest, for example when I run a count group by query against a table for first time it takes 3000ms, but if I run it again in next couple minutes, the same query consistently taking less than 100ms.
This is typically due to Pinot MMAP’ing segments. Could you share:
- Do you have local disk vs remote/EBS?
- Are those SSD vs HDD?
- What's the total RAM and what's your xms/xmx?
- Is the query triggering a ton of random reads? This can sometimes be avoided by picking the right sorted index.
Hi @Mayank, we are using attached ebs volume with GP2 (ssd) storage, memory: 80G, -Xms40G -Xmx50G. This observation is not a problem for us for now but we are curious to know what’s happening behind the scene.
Yeah, you have too much heap. I’d say don’t go beyond 24GB (xms=xmx)
Also, do you have any sorted index?
Reducing heap may not help with this. We should try to reduce the random seeks.
No we don't have sorted index other than the default time column, and the specific query I am playing with is scanning all time's data so prob not much difference?
Yeah, don’t set time as sorted column. If you have any other column like userId that appears in all queries like
where userId = xxx
that would be a good one. Sorting will improve locality and hence reduce random seeks.
Yeah that makes sense, I will play around with different query patterns we have and decide the further optimization. Thanks for the suggestion!