Hi team,
I have two questions about the star tree index. Can anyone help me? Thank you in advance.
the streaming table has an with star-tree index config as below:
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"event",
"epoch_minute",
"metric_source",
"tenant",
"topic"
],
"skipStarNodeCreationForDimensions": [],
"functionColumnPairs": [
"SUM__m1_rate",
"MAX__p50"
],
"maxLeafRecords": 5000000
}
and the column "epoch_minute" is an dimension column with data type "long", and derived from table timestamp column with transformation of "ToEpochMinutes(clock)".
1. If the query clause include range condition, star tree index cannot work.
(Explain statement show star tree index is not trigged, while some data can be returned when statement run).
(If epoch_minute range condition is removed, explain statement show star tree is trigged.)
Here is the sql statement:
select event, epoch_minute, topic, tenant
from testtable
where event='myevent'
and (epoch_minute > 27606367)
and (epoch_minute < 27606372)
GROUP BY event, epoch_minute, topic, tenant
ORDER BY event, epoch_minute
LIMIT 500000
My first question is: Why range condition blocked the selection of star tree index?
2. In above question, although the star tree is not trigged during the sql execution, its segment selection result shows the epoch_minute range condition filtered some segments which run the real query. its response parameter numSegmentsProcessed is much smaller than below statement:
select event, epoch_minute, topic, tenant
from testtable
where event='myevent'
GROUP BY event, epoch_minute, topic, tenant
ORDER BY event, epoch_minute
LIMIT 500000
My second question is:
What is the segment selection scenario works? So far as I know, pinot server uses timestamp column to filter segments, but epoch_minute is an dimension column, instead of timestamp column, why it works in the segment selection phase?
I am looking forward for an pinot expert to help me. Thank you very much.
Regards,
Ming