I have a use case where I need to perform a group by based o Apache Pinot #troubleshooting

I have a use case where I need to perform a group ...

Abdullah Jaffer

07/06/2022, 12:05 AM

I have a use case where I need to perform a group by based on some filters and then average out the result of the group by so:

Copy code

select sum(col1) as sum table group by col2
result:
sum col2
1      1    
2      2
3      3

and then average the result, (1 + 2 +3)/3 = 2 I need a subquery for this, i think this is not supported in Pinot, can this be accomplished in the Trino connector? If so, how efficient is this? I don't want to avg the result code since that is not scalable due to unexpected no# of results in the group by

Mayank

07/06/2022, 12:07 AM

Do you need Trino for other reasons, or just for this? If the sub-query is written in a way that it can be entirely executed by Pinot, Trino will push it down completely, and the only operation happening on Trino would be the averaging.

Abdullah Jaffer

07/06/2022, 12:09 AM

@Mayank That would be ideal, we want most of the query to run in Pinot if possible, we just have a trino connector to provide additional functionality like subqueries for cases like this

Mayank

07/06/2022, 12:10 AM

Yes, the connector should push down as much as Pinot can evaluate.

Abdullah Jaffer

07/06/2022, 12:12 AM

The problem is that col 2(used in group by), can have millions of unique values, my concern is that this data should not be loaded in the application layer, will trino ensure this, basically to summarise what I am saying where will this dynamic table(based on group) by be stored, in application memory?

Mayank

07/06/2022, 12:14 AM

Not sure if I follow. Pinot will compute the groups and send back to Trino, which in turn will compute the avg and return back to application.

Abdullah Jaffer

07/06/2022, 12:15 AM

Ah I see, thanks, that clears it up actually

Open in Slack

Previous Next