Ken Krugler
12/29/2020, 3:17 PMChinmay Soman
12/29/2020, 6:20 PMJackie
12/29/2020, 7:50 PMnumDocsScanned
to see if the query is scanning the pre-aggregated recordsKen Krugler
12/29/2020, 8:00 PMnumDocsScanned
= 1B (out of 1.7B) for case where the filter matches a few records, 362M for case where the filter matches no records, and 362M for the case of no filtering. But time taken is almost identical in all cases (27 seconds, +/- 1 second).Jackie
12/29/2020, 8:13 PMJackie
12/29/2020, 8:14 PMKen Krugler
12/29/2020, 8:14 PM"starTreeIndexConfigs": [{
"dimensionsSplitOrder": [
"advertiser",
"adHash",
"network",
"imageSize",
"adType",
"platform",
"country",
"crawlDays"
],
"skipStarNodeCreationForDimensions": [
],
"functionColumnPairs": [
"SUM__adSpend",
"SUM__impressions",
"DistinctCountHLL__adHash",
"DistinctCountHLL__crawlDays",
"MIN__crawlDays",
"MAX__crawlDays"
],
"maxLeafRecords": 10000
},
Ken Krugler
12/29/2020, 8:14 PMselect adHash,advertiser,sum(adSpend) from crawldata group by adHash,advertiser order by sum(adSpend) desc limit 100
Jackie
12/29/2020, 8:16 PMadHash
and advertiser
has very high cardinality?Ken Krugler
12/29/2020, 8:16 PMJackie
12/29/2020, 8:17 PMJackie
12/29/2020, 8:18 PMJackie
12/29/2020, 8:18 PMadHash
and advertiser
can be pre-aggregatedKen Krugler
12/29/2020, 8:19 PMJackie
12/29/2020, 8:20 PMselect adHash, sum(adSpend) from crawldata group by adHash order by sum(adSpend) desc limit 100
and see if it is faster?Ken Krugler
12/29/2020, 8:20 PMKen Krugler
12/29/2020, 8:21 PMJackie
12/29/2020, 8:21 PMKen Krugler
12/29/2020, 8:25 PMKen Krugler
12/29/2020, 8:25 PMKen Krugler
12/29/2020, 8:26 PMJackie
12/29/2020, 8:30 PMnumDocsScanned
for star-tree queries, maybe ~300M unique adHashes 😉Ken Krugler
12/29/2020, 10:04 PMKen Krugler
12/31/2020, 12:42 AMKen Krugler
12/31/2020, 12:42 AMAnother way is to enable the tracing for the query and see if it uses theStarTreeFilterOperator
Ken Krugler
12/31/2020, 12:42 AMJackie
12/31/2020, 12:43 AMKen Krugler
12/31/2020, 12:49 AM{\"0_10\":[
{\"StarTreeFilterOperator Time\":21}
{\"DocIdSetOperator Time\":69}
{\"ProjectionOperator Time\":69}
{\"TransformOperator Time\":69}
{\"DocIdSetOperator Time\":17}
{\"ProjectionOperator Time\":17}
{\"TransformOperator Time\":17}
{\"AggregationGroupByOrderByOperator Time\":97}]}
Jackie
12/31/2020, 12:58 AMAggregationGroupByOrderByOperator
time includes the time for other operators because it is nestedKen Krugler
12/31/2020, 12:59 AMJackie
12/31/2020, 1:01 AMAggregationGroupByOrderByOperator
TransformOperator
ProjectionOperator
DocIdSetOperator
StarTreeFilterOperator
And each one should include the time for the previous layer + time for the extra operationsJackie
12/31/2020, 1:03 AM{\"TransformOperator Time\":17}
{\"AggregationGroupByOrderByOperator Time\":97}]}
We know the engine spends 17ms running the transform operator + 80ms aggregating the recordsKen Krugler
12/31/2020, 1:08 AM