Hello Is it possible to benefit from Pinot's Star-...
# general
j
Hello Is it possible to benefit from Pinot's Star-Tree index when performing aggregation queries on known time ranges ? For example, if I know in advance that I will get queries like this :
SELECT SUM(value) FROM values WHERE timeString BETWEEN '2021-01-01' AND '2021-01-08'
(ex: rolling week) Can some configuration of StarTree index precompute this sort of query ? (or even part of it) [I know this looks like a TSDB use case, but still, I'm hopeful 😄]
m
You could do that, I'd recommend first checking the perf of out-of-the-box Pinot.
Reason being you'll do a better storage vs latency cost balance.
âž• 1
j
Yep, actually I've tried it out of the box, and performance was already very good, but I was thinking ahead in case we need even more (or have greater volume of data) Good to know it's possible ! Thanks @User 🙂
m
Is this a realtime only use case?
j
Should be mostly batch in that case
m
I asked because you can also do some pre-aggregation in your batch job. In short, multiple options to explore
j
That would be outside Pinot, right ? Or is there some ingestion option for this ?
m
Good question - There's a pre-aggregate metrics option for RT tables to do it within Pinot. For batch, you could do it more efficiently outside.
🙂 1
j
You can simply add
timeString
into the star-tree split order and star-tree is able to handle the query
What is the granularity of the time column? Based on the query, may I assume it is in DAYS?
j
@User you're right, daily data
j
Then including
timeString
in the star-tree dimension split order should work well
âž• 1
j
I'll keep that in mind, thanks !
m
Yeah, the out-of-box should already give you a lot of mileage. But as Jackie said, if you need more you have this option.
👌 1