hi team, we have a realtime pinot table configured...
# getting-started
k
hi team, we have a realtime pinot table configured with ingestion-aggregation. can we also setup start-tree index on this table?
j
Yes you can. This is a pattern we've used internally to get even further latency improvement.
k
great thank you! i see from this thread https://apache-pinot.slack.com/archives/C01H1S9J5BJ/p1663620167716589 , ingestion aggregation only aggregates at segment level and not for the whole table. so we should still be structuring our queries to do aggregation right? for the example here https://docs.pinot.apache.org/developers/advanced/ingestion-level-aggregations to find out total sales for day. we need to group by product_name for given day and do sum(total_sales). instead of just looking at total_sales?
j
It depends on your query patterns. We have some cases where we only every query an hour of data. So we transform the time column to hourly granularity, aggregate everything else, and it's effectively a one row lookup. Some tables might be aggregated hourly, but we query days of data. That's worked well enough.
I modeled this as a table with 2 dimensions: what minimum granularity do you need, and what maximum range are you going to query over. As that grows, your query will get slower. Using a startree might make it faster, but only if your query spans full segments.
And yes, you still group by day and do the sum of sales. It's definitely redundant. Our queries internally look like sum(sum_field). We had planned to make this look better at the broker layer but did not have resourcing
🙏 1
k
this context is very helpful. thank you Johan. our granularity is at day-level. based on the performance we are planning to set up star-tree index over this aggregated data.