Ken Krugler
03/04/2021, 4:50 PMDistinctCountHLL
only works for single value fields. It seems like a simple change in DistinctCountHLLAggregationFunction.aggregate()
to check if the BlockValSet
is multi-valued, and if so then call BlockValSet.getXXXMV()
and do a sub-iteration on the secondary array it returns. Does that make sense?Kishore G
Ken Krugler
03/04/2021, 5:09 PM"message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.core.segment.index.readers.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:439)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.query.aggregation.function.DistinctCountHLLAggregationFunction.aggregate(DistinctCountHLLAggregationFunction.java:103)\n\tat org.apache.pinot.core.query.aggregation.DefaultAggregationExecutor.aggregate(DefaultAggregationExecutor.java:47)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:66)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:35)\n\tat org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:94)\n\tat org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)"
Ken Krugler
03/04/2021, 5:09 PMMayank
distinctCountHLLMV
?Mayank
MV
suffix in the name.Ken Krugler
03/04/2021, 6:59 PMaggregate
, aggregateGroupBySV
, and aggregateGroupByMV
. Made me think there was a missing aggregateMV
function. I see now that the BySV
and ByMV` methods are for doing aggregations when the grouping column is SV vs. MV.Mayank
Ken Krugler
03/04/2021, 6:59 PMBlockValSet
could be used to determine whether to handle it as an SV or an MV column.Mayank
Ken Krugler
03/04/2021, 7:00 PMMayank