https://pinot.apache.org/ logo
#general
Title
# general
k

kelv

05/26/2021, 5:35 PM
Hi! What is the reason for having Pinot queries implicitly default to
limit 10
? Is there a way to remove the limit in the query without specifying an arbitrary limit value?
m

Mayank

05/26/2021, 6:20 PM
Without a default, a simple looking query (to some) like
select *
could end up fetching all of Pinot's data?
👍 1
k

kelv

05/26/2021, 6:38 PM
Ok, so it's purely for safety. Is there a way to remove the limit without specifying an arbitrary limit value?
m

Mayank

05/26/2021, 6:40 PM
No. What's your use case? Is it purely for convenience, or your application does indeed want to pull out all data from Pinot?
k

kelv

05/26/2021, 7:31 PM
Data is streamed into Pinot and over the course of the day, application instances want to get all the messages of interest starting from 0000 hours. Number of messages differ according to the query criteria. We can put in a "large enough" limit value, but that is an assumption it will suffice in the future.
m

Mayank

05/26/2021, 7:32 PM
There's the syntactic challenge there. If we make default unlimited then we have the problem above. Once we limit the default, then any non-default value has to be specified due to syntax
k

kelv

05/26/2021, 7:33 PM
I think if anyone does it explicitly, they know what they are getting into.
m

Mayank

05/26/2021, 7:34 PM
Yes, the worry is about new users who don't know that they may end up fetching entire data from Pinot just by doing
select *
. Since most new folks won't specify
limit
We are err'ing on the safer side here.
k

kelv

05/26/2021, 7:35 PM
I think it's fine to retain the implicit default limit. Then offer something to the effect of
limit unlimited
for users wanting to query with knives.
m

Mayank

05/26/2021, 7:36 PM
I see. What's the SQL way of doing it?
k

kelv

05/26/2021, 7:37 PM
set a session parameter?
Something to think about? Not sure if there is the notion of session parameters / environment variables that we can leverage. May be able to look into making a PR if it's considered "easy level"
m

Mayank

05/27/2021, 6:03 PM
So, there isn't a cross query state
k

kelv

05/27/2021, 6:05 PM
how about setting HTTP headers?
m

Mayank

05/27/2021, 9:17 PM
Typically, folks set a really large value like 1M, or 100M. But they also are aware that doing so means that if they run a really expensive query they might stress the cluster load.