https://pinot.apache.org/ logo
k

Ken Krugler

07/16/2021, 9:59 PM
I’m trying to understand an interesting anomaly with the results of a query. I do
select sum(metric), key, min(date) as firstSeen, max(date) as lastSeen from table where date >= <lowDate> AND date <= <highDate> group by key order by firstSeen desc limit 1
. I get a single row as expected, but with a
firstSeen
and
lastSeen
both equal to
<highDate>
. I was expecting the
firstSeen
result to be equal to
<lowDate>
. If I then run the exact same query, but add in
AND key = '<key value from the previous result>'
, I get the a single row with the requested key value, but now the
firstSeen
result is equal to
<lowDate>
(as expected), and
sum(metric)
is larger (also as expected). Any ideas what is going on?
j

Jackie

07/16/2021, 10:06 PM
How about
select date from table where date >= <lowDate> AND date <= <highDate> AND key = '<key value from the previous result>'
?
Is
key
a very high cardinality column? If so, some groups might be dropped due to Pinot's approximation algorithm for group-by queries
k

Ken Krugler

07/16/2021, 10:09 PM
It’s high cardinality, but
numGroupsLimitReached: false
in response. And the difference is for a specific key value, not a missing key value.
If I do
select date,metric from table WHERE date >= <lowDate> AND date <= <highDate> AND key = '<key value from previous result>'
, I get four rows. The sum of the metric values is the larger (expected) value. The one row with
date = <highDate>
has the
metric
value = the
sum(metric)
returned from the first query.
This is with Pinot 0.7.1
Hmm, if I do the first query, but change to
order by lastSeen
(instead of by
firstSeen
), the results are more correct.
firstSeen
seems right for the various rows I get back, and the
sum(metric)
almost matches what I get when I add the
AND key = 'specific key value'
filter.
Something does seem broken, sadly 😞 Happy to do a screen share if that would help (I can set up a Zoom session if that’s easier)