hey friends i come to you with a query optimization question Apache Pinot #troubleshooting

hey friends, i come to you with a query optimizati...

Luis Fernandez

08/22/2022, 7:28 PM

hey friends, i come to you with a query optimization question, we have the following query:

select sum(impression_count) from metrics where user_id = xxx and product_id = xxxx

this query even tho it has more selectivity is slower than this one:

select sum(impression_count) from metrics where user_id = xxx

we currently are partitioning on the

user_id

and have a bloom filter on

product_id,user_id

is it because of the partitioning (?) also we do see an elevation of

numEntriesScannedInFilter

when we also add the

product_id

in the query but without it is pretty much 0, what do you recommend to do in this case.

Kishore G

08/22/2022, 7:33 PM

What’s the cardinality of product_id. Bloom filter is not useful in this case. You should try inverted index on product_id

Kishore G

08/22/2022, 7:33 PM

do you have sortex index on user_id column?

Luis Fernandez

08/22/2022, 7:34 PM

yes, user_id has sorted index

Kishore G

08/22/2022, 7:35 PM

try inverted index on product_id, it should be fast

Luis Fernandez

08/22/2022, 7:35 PM

this one basically: https://docs.pinot.apache.org/basics/indexing/inverted-index#bitmap-inverted-index

Luis Fernandez

08/22/2022, 7:38 PM

i guess i was going off of the matching EQUALITY on a predicate https://docs.pinot.apache.org/basics/indexing/bloom-filter

Luis Fernandez

08/22/2022, 7:39 PM

so partitioning plus inverted should work fine yes?

Kishore G

08/22/2022, 7:41 PM

Yes

Kishore G

08/22/2022, 7:42 PM

You can combine have multiple index on the same column if needed and Pinot will pick the right one based on query..

😱 1

Luis Fernandez

08/22/2022, 7:44 PM

so i could potentially just leave the bloom filter there? there’s bloom filter for both

user_id

and

product_id

Kishore G

08/22/2022, 7:47 PM

you can but its not needed

Luis Fernandez

08/22/2022, 8:11 PM

will that improve the

numEntriesScannedInFilter

Luis Fernandez

08/22/2022, 8:11 PM

i just applied the change but still see it elevated

Zhuangda Z

07/20/2023, 3:53 PM

@Kishore G bloom filter not being useful is it because

product_id

is not a partition key so bloom filter wouldn’t prune many segments? Is it a rule of thumb that bloom filter should only enable for cols that are part of the primary key in general?

Zhuangda Z

07/20/2023, 3:56 PM

and a col can have both BF and inverted index at the same time b/c BF just prunes segments while inverted helps to pick the matched rows? Or it is not suggested to have them enabled for a col at the same time since inverted can already find all the matched rows in constant time?

Open in Slack

Previous Next