Hi team, I am seeing inaccurate result in pinot ag...
# troubleshooting
g
Hi team, I am seeing inaccurate result in pinot aggregation query 👀. For this table that has 7B records, I try to run a group by uuid to see how many records for each uuid, and for most of the uuid they should have around 100 records. But when I run something like
select uuid, count(*) from table group by 1
I get very inaccurate aggregation result, for example, uuid
a
will show it only has 3 records in count() here, but if I only query specifically for this uuid, like `select uuid, count() from table where uuid='a' group by 1` , it will show the correct result which is 100. Can someone help me understand what is going on here?🙏
r
hi which version are you using?
g
0.9.2
r
can you just check what happens when you write
group by uuid
instead? Very likely the same thing, but doesn't hurt to check
I think what's probably happening here is truncation
because uuid cardinality is so high, the group by is truncated
🤔 1
can you add
limit 10
to the faulty query please?
👀 1
g
yes
group by uuid
return the same result
limit 10
also gives similar undercounting result
r
if you upgrade to 0.10.0 you can get an explain plan
do you have a startree on uuid?
g
because uuid cardinality is so high, the group by is truncated
Is there any writeup help me understand the truncate behavior here? Is this a feature? If so why it seems to introduce such high discrepancy?
No I don’t think I’ve enabled any index for uuid, whatever default should be it
if you upgrade to 0.10.0 you can get an explain plan
yeah I can try to do an ungrade. What do you think the potential solution could be?
r
it's not a solution, but it allows you to give us an explain plan
which would help identify the problem
🔍 1
can you add
order by uuid
please?
👀 1
g
in the middle of upgrade now, will try add order and query again after it’s done
lol seems like
order by uuid
is too heavy to run? I kept getting 502 bad gateway or server not respond error
but I ran the explain plan
explain plan for group by:
explain plan for group by + order by:
h
hey @User I guess it's related to this https://docs.pinot.apache.org/users/user-guide-query/grouping-algorithm#group-by-behavior. Group by is by default an approximation, in your case, it will lead to problems in your case
g
Hi Haitao thanks for the reply! Yeah I’ve also landed in this page during investigation but haven’t got time to try out the configurations suggested in this wiki yet. I plan to play around with the related confs here to improve the accuracy with reasonable cost and will report back how it goes. Let me know if you happened to have experience similar problem before or if you have any other suggestions 🙏.
cc @Mingmin Xu