hey friends, i come to you with a query optimizati...
# troubleshooting
l
hey friends, i come to you with a query optimization question, we have the following query:
select sum(impression_count) from metrics where user_id = xxx and product_id = xxxx
this query even tho it has more selectivity is slower than this one:
select sum(impression_count) from metrics where user_id = xxx
we currently are partitioning on the
user_id
and have a bloom filter on
product_id,user_id
is it because of the partitioning (?) also we do see an elevation of
numEntriesScannedInFilter
when we also add the
product_id
in the query but without it is pretty much 0, what do you recommend to do in this case.
k
What’s the cardinality of product_id. Bloom filter is not useful in this case. You should try inverted index on product_id
do you have sortex index on user_id column?
l
yes, user_id has sorted index
k
try inverted index on product_id, it should be fast
i guess i was going off of the matching EQUALITY on a predicate https://docs.pinot.apache.org/basics/indexing/bloom-filter
so partitioning plus inverted should work fine yes?
k
Yes
You can combine have multiple index on the same column if needed and Pinot will pick the right one based on query..
😱 1
l
so i could potentially just leave the bloom filter there? there’s bloom filter for both
user_id
and
product_id
k
you can but its not needed
l
will that improve the
numEntriesScannedInFilter
?
i just applied the change but still see it elevated
z
@Kishore G bloom filter not being useful is it because
product_id
is not a partition key so bloom filter wouldn’t prune many segments? Is it a rule of thumb that bloom filter should only enable for cols that are part of the primary key in general?
and a col can have both BF and inverted index at the same time b/c BF just prunes segments while inverted helps to pick the matched rows? Or it is not suggested to have them enabled for a col at the same time since inverted can already find all the matched rows in constant time?