This message was deleted Apache Druid #dev

Join Slack

This message was deleted.

# dev

Slackbot

09/02/2022, 10:05 AM

This message was deleted.

Gian Merlino

09/02/2022, 3:50 PM

are you talking about the byRow mode? might it make sense to tracks nulls there in some cases? like we want to add 1 for (a, null)

Abhishek Agarwal

09/02/2022, 4:46 PM

yes. do we? MySQL would exclude this tuple from the count. http://sqlfiddle.com/#!9/faf2f/38171 • Select count(distinct FName, LName) from Employees - Returns 4 • Select count(distinct FName, LName, ManagerId) from Employees - Returns 3

Gian Merlino

09/02/2022, 4:58 PM

hmm: two thoughts on that 1) if it's standard behavior for COUNT DISTINCT with multiple args then we should follow the standard 2) but i don't find it very useful! for me my main use case of counting unique tuples is to know how many rows we would have if we rolled up on those dimensions: so counting nulls is valuable putting 1 + 2 together: perhaps we want two functions so the second one can behave the way that's useful for rollup estimation?

Abhishek Agarwal

09/05/2022, 1:54 PM

yeah. something like

count(unique)

described here - https://www.ibm.com/docs/en/informix-servers/12.10?topic=expressions-count-distinct-count-unique-functions

Laksh Singla

09/18/2022, 1:50 PM

While the SQL Fiddle returns 3 for the query, I think the wording in the above IBM’s spec says:

If the COUNT DISTINCT function encounters NULL values, it ignores them unless every value in the specified column is NULL. If every column value is NULL, the COUNT DISTINCT function returns zero (0).

So, IMO a row like (null, null, null) should not be counted and else like (’A, null, null) should be This might not be a reliable source of information but FWIW here, the output includes the row with NULL value in a single column (attached screenshot). I didn’t find much documentation on handling of the null values esp with multiple columns.

Open in Slack

Previous Next