Tommaso Garuglieri
07/06/2022, 9:18 PM7c15bc
) to replace trino for low latency geospatial aggregation queries.
We have a table with about 10^8
records, using the star tree indexes we are able to perform our queries orders of magnitude faster than trino 🚀.
But when using geospatial filtering, ST_Contains
we experience performance degradation (from <200ms to over 10 seconds) even if we are using H3 geospatial indexes with different resolutions (which are triggered as expected) and the geometres are not complex.
Our queries are pretty straightforward:
select sum(x) from <table> where ST_Contains(ST_GeogFromText('...'),location_st_point) = 1 and ...
Is there any way we can improve latencies for geospatial aggregations ?
Should we just avoid geospatial filters with this number of records ?Mayank
Mayank
Tommaso Garuglieri
07/06/2022, 9:40 PMselect count(*) from xxxx_OFFLINE where ST_Contains(ST_GeogFromText('POLYGON((10.121899631484375 43.78360214117981,12.992289079218752 44.4190176976316,14.654093496328125 42.34891239577081,12.582233443828127 41.157530371951424,10.121899631484375 43.78360214117981))'),location_st_point) = 1
metadata
timeUsedMs: 12101
numDocsScanned: 6476678
totalDocs: 71106843
numServersQueried: 3
numServersResponded: 3
numSegmentsQueried: 5
numSegmentsProcessed: 5
numSegmentsMatched: 5
numConsumingSegmentsQueried: 0
numEntriesScannedInFilter: 9567118
plan
BROKER_REDUCE(limit:10) 1 0
COMBINE_AGGREGATE 2 1
PLAN_START(numSegmentsForThisPlan:5) -1 -1
AGGREGATE(aggregations:count(*)) 3 2
TRANSFORM_PASSTHROUGH() 4 3
PROJECT() 5 4
DOC_ID_SET 6 5
INCLUSION_FILTER_H3_INDEX(inclusionIndex:h3_index,operator:EQ,predicate:stcontains('8400000001000000050000000040243e69a0e53d5c4045e44d1330b8b04029fc0d506ed851404635a25f35e2f0402d4ee557be82dd40452ca92950809840292a1a807fc22e40449429f489d43c40243e69a0e53d5c4045e44d1330b8b0',location_st_point) = '1')
Mayank
Yupeng Fu
07/07/2022, 12:57 PMYupeng Fu
07/07/2022, 12:57 PMYupeng Fu
07/07/2022, 12:57 PMTommaso Garuglieri
07/07/2022, 1:35 PMKishore G