I’m running into a situation where Pinot is using ...
# troubleshooting
w
I’m running into a situation where Pinot is using a star-tree index to satisfy a query in one case, but not in another, and the queries are almost identical. This one does not use the star tree:
Copy code
SELECT dimension, SUM(metric) AS totalMetrics FROM myTable WHERE otherDimension='filterValue' AND eventTimestamp >= cast(now() - 172800000 as long) GROUP BY 1 ORDER BY 2 DESC LIMIT 10
This one uses the star tree:
Copy code
SELECT dimension, SUM(metric) AS totalMetrics FROM myTable WHERE otherDimension='filterValue' AND eventTimestamp >= 1611161288000 GROUP BY 1 ORDER BY 2 DESC LIMIT 10
It looks like the use of a dynamically-computed timestamp value is confusing the optimizer somehow? the
eventTimestamp
column is not part of my star-tree index in either case.
m
How did you find out that StarTree was used for one query?
w
Trace
m
You are right, the code to check if star-tree can be used or not does not handle the expression
cast(now() - 172800000 as long)
. You can file an issue for the same.
w
@Mayank I’m new to the Pinot codebase, but if you can point me in the right direction, I’d be happy to see if I can put together a PR to address this.
m
Nice
Let me post a pointer here
Note, after optimization lhs and rhs for the time predicate are swapped
w
@Mayank Correct me if I’m wrong, but based on a quick look, it appears that the problem might actually be in
extractPredicateEvaluatorsMap
, and not in
isFitForStarTree
?
m
yes
isFitForStarTree
is the high level entry point for you to get the algorithm. The code that passes the predicateColumns is the one that needs to be checked.
w
Got it, perfect, and thank you.
👍 1
m
You'll jump a few classes from this entry point to go to the exact place that needs fix, but you'll get a better picture of what is going on
w
Yup, looks like the various Aggregation*PlanNode instances. I’ll spend some time on it after I get done my day job, thanks again!
m
They'll probably lead you to
Copy code
StarTreeUtils.extractPredicateEvaluatorsMap
w
Yeah, they did, that was what I referenced above 🙂
m
Oh yeah you did
w
@Jackie Saw you commented on the time column predicate issue… wanted to tag you here for context.
j
@Will Briggs I think the problem here is that
1611161288000
is in micros instead of millis
Then everything is pruned out
m
Copy code
I did a simple test to compile the query to BrokerRequest, and do see cast(now() - 172800000 as long) as LHS for predicate. My test did not actually go through any query execution.
w
@Jackie 1611161288000 is ms since epoch
j
Oh, sorry my bad
👍 1
The problem is not how the predicate is evaluated, but why the first query is not converted to the second query on the broker side
m
Because of use of a function (now()) as opposed to a constant eval? (just guessing)
j
Seems working without the cast
w
yes, it seems to be the cast. The only reason the cast is necessary is being addressed here: https://github.com/apache/incubator-pinot/pull/6403
m
Yeah, I am working with @Amrish Lal on ^^
w
Yes, I was just thanking him earlier because I noticed that PR 🙂
I didn’t realize these were so closely intertwined, though
m
I have concerns with this PR, so we would probably find other ways to fix the problem this PR is trying to address.
j
The cast is not invoked on broker side because cast is not registered as a scalar function
w
So presumably there’s something buried in the
PinotQuery2BrokerRequestConverter
(or thereabouts) that is reifying eligible expressions into constants (e.g.,
now()
) before handing the query off from the Broker for execution, and it isn’t handling
cast
function expressions?
ah
j
We can add a scalar function for cast, then it should work on broker side (compile time)