Query regarding Apache Pinot, whats the typical OLAP cube size one can host in Apache Pinot, we have a cube which is almost 50 TB m it has some dimensions which very high cardinality but since raw data is more than 5 Petabytes, 50-100 TB is still a reasonable aggregation. We want interactive performance with our OLAP since it would power important Dashboards and drill-downs. So want to know how much data size we can push inside Apache Pinot??
m
Mayank
12/18/2020, 5:20 PM
With Pinot, you don't need to precompute all the cubes upfront (at write time). You can pre-aggregate your raw data, and the cubing will happen at read time based on the query.
Mayank
12/18/2020, 5:26 PM
If you are saying your single cube (which you will further drill-down on) is 50 TB, then Pinot scales horizontally, so theoretically there isn't a limit on data size. However, depending on your use case, you may need to enable some optimizations (eg partitioning). If you can share a bit more on your use case, we can help with figuring out to solve it using Pinot.
d
dhurandar
12/18/2020, 5:27 PM
I see we already have cube deployed, but it persisted in S3 with Presto in front
dhurandar
12/18/2020, 5:28 PM
We expect it to grow to around 150 TB in 1 year or so and then it would stabalise
dhurandar
12/18/2020, 5:28 PM
but we are not getting interactive performance and for that we are looking at Apache Pinot
m
Mayank
12/18/2020, 5:28 PM
Let's move to #troubleshooting
d
dhurandar
12/18/2020, 5:29 PM
Mostly sub-second latency. Our current latency with Presto and S3 is around 10+ seconds given the volume