Currently we are deploying Pinot for customers fac...
# general
c
Currently we are deploying Pinot for customers facing online query. And we also have a use case to store 2 years data could be hundreds of millions records every day , and to build a offline report generator to query the offline data, do aggregation on different dimensions and convert to a csv report. Is Pinot able to handle this kind of use case? Would the offline report query affect online customer query latency? How is the cost efficiency to host a Pinot cluster to handle this kind of use case?
m
At high level - what you have described is possible using Pinot. Would need more concrete information to suggest how to make it work
c
Great to know, basically as offline report generator, we are building a backend service to query Pinot to do aggregation on time and other dimensions over max 2 years of data. User can be notified when report is ready. Should we separate this Pinot cluster serving offline report from the Pinot cluster serving online web queries?
k
Try with same cluster for now.. you can try star tree index to minimize the impact