This message was deleted.
# dev
s
This message was deleted.
g
are you talking about the byRow mode? might it make sense to tracks nulls there in some cases? like we want to add 1 for (a, null)
a
yes. do we? MySQL would exclude this tuple from the count. http://sqlfiddle.com/#!9/faf2f/38171 • Select count(distinct FName, LName) from Employees - Returns 4 • Select count(distinct FName, LName, ManagerId) from Employees - Returns 3
g
hmm: two thoughts on that 1) if it's standard behavior for COUNT DISTINCT with multiple args then we should follow the standard 2) but i don't find it very useful! for me my main use case of counting unique tuples is to know how many rows we would have if we rolled up on those dimensions: so counting nulls is valuable putting 1 + 2 together: perhaps we want two functions so the second one can behave the way that's useful for rollup estimation?
a
l
While the SQL Fiddle returns 3 for the query, I think the wording in the above IBM’s spec says:
If the COUNT DISTINCT function encounters NULL values, it ignores them unless every value in the specified column is NULL. If every column value is NULL, the COUNT DISTINCT function returns zero (0).
So, IMO a row like (null, null, null) should not be counted and else like (’A, null, null) should be This might not be a reliable source of information but FWIW here, the output includes the row with NULL value in a single column (attached screenshot). I didn’t find much documentation on handling of the null values esp with multiple columns.