Hey folks, beginners question here - what's the be...
# getting-started
a
Hey folks, beginners question here - what's the best practice to get large number of rows out of Pinot? We have this use case for a client of our service. The rows we have are quite chunky (many string columns with lot of data) and would like to stream it all back to the client without aggregation on Pinot side. Having a look at broker rest query I can see it returns json object of entire result (rather than streaming it) and thinking it may be a problem for us. Does client deal with this? Is there another way to fetch data from Pinot?
g
Hi @Albert Latacz, right now the options are 1. Data plane APIs (Broker URL) or 2. Use SDKs within your microservice eg: Java or Python. What will be the ideal solution for you ? If you still want this for showing the results in a UI, would Pinoy -> Websocket publish make sense ? or you are trying to use Pinot like a transformation layer here ? I am trying to unpack what do you mean by 'stream it all back'
a
Ah sorry @guru, I wasn't very clear so will try to explain better... we are connecting to Pinot from another server. Other servers we load data from would stream it back and clients would handle it without much complexity (e.g. in case of plain sql databases we will use a cursor or for rest services we normally send ndjson back to simplify parsing and avoid loading single json doc with all rows). As far as I can tell broker api returns single object with nested results as array. Pinot clients use same api and all results are is resultsTable in BrokerResponse, which is then used in ResultSet. There also appears to be a limit of 1M rows on statement in jdbc client if I'm reading code right but would be good to clarify. So if we want to get let's say 10M large rows out is querying broker api best way? Will pinot even return it? Do we need a streaming parser to parse it out of single object?
m
I think there is a grpc endpoint on broker that can stream @Rong R ?
r
the document saids presto but you can programmatically access it similar way via your own client
m
Although @Albert Latacz, streaming lots of raw data is not the best use case for Pinot. Do you have aggregation queries as well
a
Thanks, yes we do have aggregations that we run as well which are mostly time/version based at the moment. At the moment we are trying to prove that pinot holds for our other use cases. Basically we need to be able bootstrap some services with data from pinot but I appreciate that it may not be the optimal way to use it.
Have not used presto but we have grpc in the stack so will take a look at it
m
Oh ok, so you want to stream our all data from Pinot? I’d still say that isn’t the best use
a
well data will be a result of query but may be a lot of it
At present we store different 'versions' of data depending on priority of processing upstream. We are selecting them as a part of query using LASTWITHTIME which seems to do the trick. Result of that query may return lots of data.