Apache Pinot

Currently `DistinctCountHLL` only works for single value fields. It seems like a simple change in `DistinctCountHLLAggregationFunction.aggregate()` to check if the `BlockValSet` is multi-valued, and if so then call `BlockValSet.getXXXMV()` and do a sub-iteration on the secondary array it returns. Does that make sense?

Surprised that’s it’s not supported as of now

If you run this query on a MVF, you get:
```    "message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.core.segment.index.readers.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:439)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.query.aggregation.function.DistinctCountHLLAggregationFunction.aggregate(DistinctCountHLLAggregationFunction.java:103)\n\tat org.apache.pinot.core.query.aggregation.DefaultAggregationExecutor.aggregate(DefaultAggregationExecutor.java:47)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:66)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:35)\n\tat org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:94)\n\tat org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)"```


<@U01ECR21TEE> can you try `distinctCountHLLMV`?

Aggregation functions on MV columns have an `MV` suffix in the name.

<@UDQU92KBK> Thanks for clarifying, I was confused by seeing `aggregate`, `aggregateGroupBySV`, and `aggregateGroupByMV`. Made me think there was a missing `aggregateMV` function. I see now that the `BySV` and ByMV` methods are for doing aggregations when the grouping column is SV vs. MV.

<@UDQU92KBK> But why does there need to be a different function? In the implementations the function signatures are the same, and (I assume) the `BlockValSet` could be used to determine whether to handle it as an SV or an MV column.

OK, I’ll change my issue description :slightly_smiling_face: