Hi , Wanted to understand what should we do in reg...
# general
h
Hi , Wanted to understand what should we do in regards to handling "nulls" in aggregation queries. So Pinot saves default values instead of nulls but it will effect the final result where the default value coincides with a data point, how are other folks handling this and what can we do here?
d
At the company I work for we use null as the null value, and then it's possible to aggregate values filtered by not null.
m
If you use default nulls, you will need to filter them out. There is also native null support, but it does not yet support groups-by to filter them out
d
@Mayank but we can use
WHERE x IS NOT NULL
to filter them out even when grouping the non-null values, right?
k
You can also use filter in aggregation function
m
@Diogo Baeder I meant implicit filtering. Yes you can always use explicit Filter clause
d
Ah, ok then 🙂
h
i am still not sure how do i filter, Lets suppose i have data points [1,2,4,5,...99,null,null] - now pinot will store it at[1,2,4,5,...99, 0, 0] resulting in wrong aggregations such as count distinct. Also if i need to perform groupby , i cant just use (where x is not null) as the information where a data point is null is useful for business reasons. Also i cant give default value (such as -99 instead of 0 ) as it will messup other aggreafate functions such as SUM
m
You can filter out default values in the query. or enable native null handling and filter using IS NOT NULL. https://docs.pinot.apache.org/developers/advanced/null-value-support#need-for-special-null-value-handling
d
As a Pinot user, I don't see how this is any different from other databases that handle SQL. Why not just use nulls and filter them out when doing mathematical aggregations? Then if you need to run another query just to count the nulls, for example, you still have them.