Do you have it configured explicitly? The config k...
# getting-started
j
Do you have it configured explicitly? The config key is
pinot.server.query.executor.num.groups.limit
a
@Jackie - I believe you mean in pinot-server.conf? no i haven’t set it.
j
Hmm.. That is unexpected
Do you run the query in PQL mode or SQL mode?
a
SQL mode
j
I just checked the code and we don't set it in SQL mode..
Could you please file a github issue and put the details?
a
sure. Let me file an issue. Just so that i understand, you mean the 100k limit is not set? But what is the default limit in SQL mode today then?
j
We don't put the
numGroupsLimitReached
in SQL mode. I don't know how it shows up in the response
a
got it
As per the second part of the question, given there was a limit of 10 on the query, shouldn’t this be handled by the engine (even if it was a column with more than 100k distinct values)
j
If there are over 100k (by default) distinct groups within a single segment, we will only store the first 100k groups to prevent servers running out of memory for extremely expensive queries
a
@Jackie @Kishore G - What i am trying to say is that given the query has limit of 10, can’t the engine only keep top 10 groups in memory while doing the calculation. Just curious if this could be an enhancement? Or is there a reason why this cannot be done?
k
it will be wrong to keep only 10 groups
results will be incorrect
j
In order to aggregate the values for the groups, we need to keep all groups, and sort the aggregated values in the end to get the 10 final groups
k
@Jackie we dont keep 100k groups by default right, I think its more like min(limit * some scaling factor, 100k)
j
Within the segment, we keep all the groups. Then we aggregate on these groups, sort them, and then keep
max(limit * 5, 5000)
groups
Without sorting, we have to keep all the groups to get the correct result
a
I thought a little on this and realized the issue with the proposal i was making. Thanks for the clarification 👍
👍 1