Hi all, I have a question regarding historical dat...
# general
k
Hi all, I have a question regarding historical data. How will pinot handle data which is existing say for quite a few years. Will there be any change in performance metrics when such historical data is queried after very long time?
d
To my knowledge, it has no performance impact, because Pinot first tries to find what segment(s) your data is at, according to the time in the query, and then fetches the data from them.
k
So will segments continue to exist in the servers or will it be paged out after certain time? And how will performance be impacted if segment is not found in server and has to be fetched from segment store?
d
It depends on whether you configure data retention or not. If you don't, the default behavior is to keep the data forever, in which case you'll always have the segments available - provided that you configured the deep store properly, of course.
k
So will there be any upper bound for storage in servers?
d
It depends on how you setup the deep store. I have my project's Pinot cluster deep store setup to use S3, which basically gives me theoretically infinite storage. I did bump upper bounds when I didn't have the deep store setup yet, in the past, though, because I ended up using all of the disk space.
k
Got it, thank you
d
No problem 🙂
m
@User what’s the data size we are talking about here?
k
The data is about 45-50GB
m
Yeah, that is quite small to think about tiered storage like solution
k
Yeah i realised after i went through the blog on tiered storage solution.
m
Cool