https://pinot.apache.org/ logo
#getting-started
Title
# getting-started
j

Jackie

01/13/2021, 6:53 PM
Do you have it configured explicitly? The config key is
pinot.server.query.executor.num.groups.limit
a

Amit Chopra

01/13/2021, 6:55 PM
@Jackie - I believe you mean in pinot-server.conf? no i haven’t set it.
j

Jackie

01/13/2021, 6:57 PM
Hmm.. That is unexpected
Do you run the query in PQL mode or SQL mode?
a

Amit Chopra

01/13/2021, 6:58 PM
SQL mode
j

Jackie

01/13/2021, 7:02 PM
I just checked the code and we don't set it in SQL mode..
Could you please file a github issue and put the details?
a

Amit Chopra

01/13/2021, 7:03 PM
sure. Let me file an issue. Just so that i understand, you mean the 100k limit is not set? But what is the default limit in SQL mode today then?
j

Jackie

01/13/2021, 7:05 PM
We don't put the
numGroupsLimitReached
in SQL mode. I don't know how it shows up in the response
a

Amit Chopra

01/13/2021, 7:05 PM
got it
As per the second part of the question, given there was a limit of 10 on the query, shouldn’t this be handled by the engine (even if it was a column with more than 100k distinct values)
j

Jackie

01/13/2021, 7:20 PM
If there are over 100k (by default) distinct groups within a single segment, we will only store the first 100k groups to prevent servers running out of memory for extremely expensive queries
a

Amit Chopra

01/13/2021, 7:23 PM
@Jackie @Kishore G - What i am trying to say is that given the query has limit of 10, can’t the engine only keep top 10 groups in memory while doing the calculation. Just curious if this could be an enhancement? Or is there a reason why this cannot be done?
k

Kishore G

01/13/2021, 7:23 PM
it will be wrong to keep only 10 groups
results will be incorrect
j

Jackie

01/13/2021, 7:28 PM
In order to aggregate the values for the groups, we need to keep all groups, and sort the aggregated values in the end to get the 10 final groups
k

Kishore G

01/13/2021, 7:29 PM
@Jackie we dont keep 100k groups by default right, I think its more like min(limit * some scaling factor, 100k)
j

Jackie

01/13/2021, 7:36 PM
Within the segment, we keep all the groups. Then we aggregate on these groups, sort them, and then keep
max(limit * 5, 5000)
groups
Without sorting, we have to keep all the groups to get the correct result
a

Amit Chopra

01/13/2021, 8:42 PM
I thought a little on this and realized the issue with the proposal i was making. Thanks for the clarification 👍
👍 1