https://pinot.apache.org/ logo
j

Jonathan Meyer

06/10/2021, 5:33 PM
Hello 🙂 Are subqueries supported in Pinot ? Such as 1.
SELECT * FROM table WHERE x in (SELECT ...)
and 2.
SELECT * FROM (SELECT ...)
(i.e. working with the output of a subquery)
My use case in particular right now is 2. Like:
SELECT SUM(agg_value) FROM (SELECT dateString, AVG(value) as agg_value FROM table GROUP BY dateString)
Or maybe there's another way to express this query ?
The data looks like this
Copy code
date    | 01 | 02 | 03
entity1 | 1  |  2 |  3 
entity2 | 3  |  4 |  5
The idea is to AVG over entity1 and entity2 per day, then SUM those results over all days
m

Mayank

06/10/2021, 5:37 PM
Pinot does not support generic nested queries. There's some limited use case like: https://docs.pinot.apache.org/basics/indexing/json-index#nested-filter-expression
thankyou 1
🆗 1
@Jackie is there way to model this query using post-aggregation?
j

Jonathan Meyer

06/10/2021, 5:41 PM
I've tried to find some info on post-aggregation (I've found a PR about this), but couldn't find much ^^ Would really appreciate @Jackie input then 😄
j

Jackie

06/10/2021, 6:23 PM
Unfortunately this cannot be achieved with the post-aggregation
j

Jonathan Meyer

06/10/2021, 6:24 PM
@Mayank @Jackie Is there another approach to solving that query ? Ideally as much as possible in realtime (i.e. no precomputation), within Pinot
j

Jackie

06/10/2021, 6:25 PM
In the subquery, we need to return the results for all groups
j

Jonathan Meyer

06/10/2021, 6:26 PM
Yes Can this subquery result be "exploited" further in any way ? Without resorting to client level aggregations ?
j

Jackie

06/10/2021, 6:26 PM
As a workaround, you may issue
SELECT AVG(value) FROM table GROUP BY dateString LIMIT 10000
(assuming the cardinality of
dateString
is smaller than 10000), then do the sum on the client side
j

Jonathan Meyer

06/10/2021, 6:27 PM
Yes, luckily dateString cardinality is low (<1000) I'll see how performance fares in that case then Do you think any upcoming feature would allow this query to be done in Pinot ?
j

Jackie

06/10/2021, 6:31 PM
We don't have a feature already planned to support this
m

Mayank

06/10/2021, 6:32 PM
One option is to use Pinot + Presto if you have nested queries like this.
j

Jackie

06/10/2021, 6:32 PM
We can probably build a simple query engine on the broker to support simple subquery like this
Which takes a result table as the source
Yes, using Presto might be an available option supported now
✔️ 1
j

Jonathan Meyer

06/10/2021, 6:33 PM
One option is to use Pinot + Presto if you have nested queries like this.
Sounds a bit overkill for that use case as it's not currently available in my environment but thanks for the idea 🙂
@Jackie I see, maybe some day then ! 😄
j

Jackie

06/10/2021, 6:33 PM
Can you please file an issue about this feature so that we keep track of it
j

Jonathan Meyer

06/10/2021, 6:36 PM
Sure, will do
Thank you very much for the feedback @Mayank & @Jackie
m

Mayank

06/10/2021, 6:37 PM
👍
j

Jonathan Meyer

06/10/2021, 6:49 PM
https://github.com/apache/incubator-pinot/issues/7043 I'm having a bit of trouble finding the appropriate terms to describe the feature request, feel free to edit & add anything 🙂