Apache Pinot

Hi ! We are evaluating Pinot (`7c15bc`) to replace trino for low latency geospatial aggregation queries.
We have a table with  about `10^8` records, using the star tree indexes we are able to perform our queries orders of magnitude faster than trino :rocket:.

But when using geospatial filtering, `ST_Contains` we experience performance degradation (from &lt;200ms to over 10 seconds) even if we are using H3 geospatial indexes with different resolutions (which are triggered as expected) and the geometres are not complex.

Our queries are pretty straightforward:
```select sum(x) from &lt;table&gt; where ST_Contains(ST_GeogFromText('...'),location_st_point) = 1 and ...```
Is there any way we can improve latencies for geospatial aggregations  ?
Should we just avoid geospatial filters with this number of records ?

Can you also share the query and response metadata 

Sure, thank you for the quick response :smile:
 *query*
```select count(*) from xxxx_OFFLINE where ST_Contains(ST_GeogFromText('POLYGON((10.121899631484375 43.78360214117981,12.992289079218752 44.4190176976316,14.654093496328125 42.34891239577081,12.582233443828127 41.157530371951424,10.121899631484375 43.78360214117981))'),location_st_point) = 1```
*metadata*
```timeUsedMs: 12101
numDocsScanned: 6476678
totalDocs: 71106843
numServersQueried: 3
numServersResponded: 3
numSegmentsQueried: 5
numSegmentsProcessed: 5
numSegmentsMatched: 5
numConsumingSegmentsQueried: 0
numEntriesScannedInFilter: 9567118```
*plan*
```BROKER_REDUCE(limit:10)	1	0
COMBINE_AGGREGATE	2	1
PLAN_START(numSegmentsForThisPlan:5)	-1	-1
AGGREGATE(aggregations:count(*))	3	2
TRANSFORM_PASSTHROUGH()	4	3
PROJECT()	5	4
DOC_ID_SET	6	5
INCLUSION_FILTER_H3_INDEX(inclusionIndex:h3_index,operator:EQ,predicate:stcontains('8400000001000000050000000040243e69a0e53d5c4045e44d1330b8b04029fc0d506ed851404635a25f35e2f0402d4ee557be82dd40452ca92950809840292a1a807fc22e40449429f489d43c40243e69a0e53d5c4045e44d1330b8b0',location_st_point) = '1')```

Yeah, the query does seem expensive. Any ideas on how to further optimize it <@UUZ3KPZ8T>, <@UDT7GFEG6>?

st_contain can be expensive, depending on the polygon size and complexity

try to use h3index, and set the resolution accordingly

<https://h3geo.org/docs/core-library/restable/>

Thank you, we are going to try different resolutions for the H3 index.
Ideally  should we size it to trigger 1 H3 partition per query ?
Can we help the query planner to prune docs by providing additional filters before the ST_Contains evaluation ?