Hi, Looking to understand the current capabilities...
# general
g
Hi, Looking to understand the current capabilities around getting large amounts of data out of Pinot in some sort of chunks. The docs are hard to sort through. I’d need any of these supported with aggregation. Can it stream? page? shard? If not now, is in the works soon? tia
g
You can use Apache Spark to read data from Pinot. There is a connector available with documentation https://github.com/apache/pinot/tree/master/pinot-connectors/pinot-spark-connector
g
@Susan Candela
k
Suggest breaking it by time.. there is also an hidden feature where you can restrict the query to a specific segment as well
g
That could be an option. We were hoping to find one query pattern/solution that would work for both elasticsearch and pinot
k
splitting it by time range..
I would do this in multiple steps • select count(*) from T where time between t1 and t2 • If the count is large > Threshold- break up the query into multiple time ranges.. you can assume that data is evenly spread across each time range • else count is small < Threshold - send one query
g
Ok thanks
x
Another option is to use presto/trino on pinot for streaming data back.