Hi team, I would like to understand if Pinot has c...
# getting-started
g
Hi team, I would like to understand if Pinot has certain query caching/warm up mechanism behind the scene? Asking because I noticed that the first run of a query is always the slowest, for example when I run a count group by query against a table for first time it takes 3000ms, but if I run it again in next couple minutes, the same query consistently taking less than 100ms.
m
This is typically due to Pinot MMAP’ing segments. Could you share:
Copy code
- Do you have local disk vs remote/EBS?
- Are those SSD vs HDD?
- What's the total RAM and what's your xms/xmx?
- Is the query triggering a ton of random reads? This can sometimes be avoided by picking the right sorted index.
g
Hi @Mayank, we are using attached ebs volume with GP2 (ssd) storage, memory: 80G, -Xms40G -Xmx50G. This observation is not a problem for us for now but we are curious to know what’s happening behind the scene.
m
Yeah, you have too much heap. I’d say don’t go beyond 24GB (xms=xmx)
🙏 1
Also, do you have any sorted index?
Reducing heap may not help with this. We should try to reduce the random seeks.
g
No we don't have sorted index other than the default time column, and the specific query I am playing with is scanning all time's data so prob not much difference?
m
Yeah, don’t set time as sorted column. If you have any other column like userId that appears in all queries like
where userId = xxx
that would be a good one. Sorting will improve locality and hence reduce random seeks.
g
Yeah that makes sense, I will play around with different query patterns we have and decide the further optimization. Thanks for the suggestion!