Hi Wanted to understand what should we do in regards to hand Apache Pinot #general

Hi , Wanted to understand what should we do in reg...

harry singh

05/18/2022, 9:41 AM

Hi , Wanted to understand what should we do in regards to handling "nulls" in aggregation queries. So Pinot saves default values instead of nulls but it will effect the final result where the default value coincides with a data point, how are other folks handling this and what can we do here?

Diogo Baeder

05/18/2022, 11:48 AM

At the company I work for we use null as the null value, and then it's possible to aggregate values filtered by not null.

Mayank

05/18/2022, 1:03 PM

If you use default nulls, you will need to filter them out. There is also native null support, but it does not yet support groups-by to filter them out

Diogo Baeder

05/18/2022, 1:06 PM

@Mayank but we can use

WHERE x IS NOT NULL

to filter them out even when grouping the non-null values, right?

Kishore G

05/18/2022, 2:04 PM

You can also use filter in aggregation function

Mayank

05/18/2022, 2:09 PM

@Diogo Baeder I meant implicit filtering. Yes you can always use explicit Filter clause

Diogo Baeder

05/18/2022, 2:10 PM

Ah, ok then 🙂

harry singh

05/19/2022, 11:15 AM

i am still not sure how do i filter, Lets suppose i have data points [1,2,4,5,...99,null,null] - now pinot will store it at[1,2,4,5,...99, 0, 0] resulting in wrong aggregations such as count distinct. Also if i need to perform groupby , i cant just use (where x is not null) as the information where a data point is null is useful for business reasons. Also i cant give default value (such as -99 instead of 0 ) as it will messup other aggreafate functions such as SUM

Mayank

05/19/2022, 12:12 PM

You can filter out default values in the query. or enable native null handling and filter using IS NOT NULL. https://docs.pinot.apache.org/developers/advanced/null-value-support#need-for-special-null-value-handling

Diogo Baeder

05/19/2022, 1:47 PM

As a Pinot user, I don't see how this is any different from other databases that handle SQL. Why not just use nulls and filter them out when doing mathematical aggregations? Then if you need to run another query just to count the nulls, for example, you still have them.

Open in Slack

Previous Next