I'm trying to optimize a query that currently cont...
# general
s
I'm trying to optimize a query that currently contains aggregation functions on top of
CASE ...
statements. I found this thread that case statements will not work with star tree - since the thread is a couple of years old just wanted to double-check if this is still the case?
m
Yes, still the case. If there’s a way to avoid the
case..
statements (by say creating derived columns), then that would work.
👍 1
s
Currently, the data model we are evaluating (preferred for our downstream users) uses a single, large, wide table rather than smaller, normalized tables. This is preferred mainly as a way to reduce complexity of application logic (needing to fire off multiple queries and join result sets client-side). This is also causing the query logic to be quite complex and reliant on
case
statements. i.e.
Copy code
select 
  max(case when source = 'x' and date = '2022-07-01' then foo else 0 end),
  sum(case when source = 'y' and date > '2022-06-25' and date <= '2022-06-30' then bar else 0 end)
  ...
from ...
where <where criteria using indexed columns>
We are seeing quite poor cold-start performance on this type of query. Is this data model / query pattern an antipattern in pinot? It seems like relying heavily on
case
statements takes away the ability to use star-tree index, which is really limiting what we can do on the index tuning side. Some of our
case
logic can be created as derived cols, but the above is example where I don't see this as an option w/o splitting into multiple queries.
k
I think this can be a good enhancement in Pinot to internally break down the query into multiple queries and stitch the results together
🙌 1
do you have an app layer above Pinot or are these queries coming from a BI tool
you should also try the agg filter and see if the performance is better
👀 1
s
We have an app layer above Pinot. But our application engineers (aforementioned downstream users) are hesitant to commit to the complexity of firing multiple queries to multiple tables and stitching results together unless there is no other way to meet our latency goals.
Thanks for the suggestion, do you have any docs/resources you can point me to for the agg filter that you mentioned?