Apache Pinot

If we want to get the total number of groups for a `group by` , I assume currently we have to do a separate `distinctcount` or `distinctcounthll`, right? But if the group by uses multiple columns, what’s the best approach to getting this total group count?

We can do `select distinctcount(concat(col1, col2, '|'))` and that works, but isn’t very fast.

Is the total number of groups very large? If not, you can set a relatively high `LIMIT` for the `GROUP BY` query or `DISTINCT` query to get all the groups

It can be large (e.g. &gt; 16M for a test query I just tried)

Do you need accurate result or approximate is fine?

Currently `distinctCount` family only support single column, we can probably extend that to support multiple columns

`distinctcounthll(concat(col1, col2, '|'))` should be faster

Yes, that’s what I’ve been using. But I would expect built-in support for multiple columns to be faster still, yes?

And there’s no standard SQL support for returning the total number of groups from the original aggregation query, right?

In SQL, I think you still need to use `count(distinct col1, col2)` to get the distinct count

Which is mostly equivalent to `DISTINCT(col1, col2)` with high limit in Pinot

`distinctcounthll(concat(col1, col2, '|'))` will suffer from Concat scalar udf 

May be better to enhance distinctCountHLL to take multiple columns 

Yes, that would be more performant. Should I file an issue?

And you can give it a shot, it’s not hard

I filed <https://github.com/apache/incubator-pinot/issues/6676>, hoping to get time to work on it :slightly_smiling_face: