Jonathan Meyer
06/15/2021, 6:45 PMselect DISTINCT(kpi) from kpis
takes ~6ms (with 100M docs, & numDocsScanned: 100000
) - this query returns 45 strings only
But doing
select DISTINCT(kpi) from kpis ORDER BY kpi
takes >300ms (50 times slower) - It scans every documents (numDocsScanned: 101250000
)
I guess the ORDER BY
breaks some optimizations down
But from the outside it seems like pretty surprising behavior (sorting 45 strings "should not take this long" is what I mean)
Anyway, not here to complain, just wanted to point it out in case it would be considered as something worth investigatingMayank
Jonathan Meyer
06/15/2021, 6:50 PMMayank
Jonathan Meyer
06/15/2021, 6:51 PMMayank
numDocsScanned: 100000
seems to suggest early bailoutJonathan Meyer
06/15/2021, 6:51 PMselect COUNT(DISTINCT(kpi)) from kpis
-> 45
select COUNT(DISTINCT(kpi)) from kpis ORDER BY kpi
-> 45COUNT
, queries are equally as fastMayank
Jonathan Meyer
06/15/2021, 6:53 PMORDER BY
ORDER BY
are you seeing the second query to be consistently slower?Yes, consistently in the 300-350ms range While the other one is in the 7-13ms range
Kishore G
Mayank
Kishore G
Mayank
public static boolean isFitForDictionaryBasedComputation(String functionName) {
//@formatter:off
return functionName.equalsIgnoreCase(AggregationFunctionType.MIN.name())
|| functionName.equalsIgnoreCase(AggregationFunctionType.MAX.name())
|| functionName.equalsIgnoreCase(AggregationFunctionType.MINMAXRANGE.name())
|| functionName.equalsIgnoreCase(AggregationFunctionType.DISTINCTCOUNT.name())
|| functionName.equalsIgnoreCase(AggregationFunctionType.SEGMENTPARTITIONEDDISTINCTCOUNT.name());
distinct
Kishore G
Jonathan Meyer
06/15/2021, 7:30 PM