The first question; When we enable `aggregateMetri...
# general
o
The first question; When we enable
aggregateMetrics
to pre-aggregation as it is consumed, pinot aggregates data based on fields which defined in
dimensionFieldSpecs
and
dateTimeFieldSpecs
. Can pinot aggregates data only based on fields which defined in
dimensionFieldSpecs
while applying pre-aggregation using
aggregateMetrics?
The second question; We can set time to generate segments for real-time table using
realtime.segment.flush.threshold.time
config. Let's assume current hour is 10:25. When i set
realtime.segment.flush.threshold.time
to
1 hour
, pinot creates segment with startTime 10:25, and it will close this segment when time is 11:25. As a result, start/end time of that segment is 1025 1125. But when the new hour starts, I want pinot to close segment.. Start/end time of that segment should be 1000 1100. How can i achieve that?
k
1. aggregating only on dimensionFieldSpecs can result in wrong results right. If you want aggregation on specific dimensions - use startree index 2. Use realtime to offline converter minion task. This will periodically merge multiple segments and create time partitioned segments
o
When i use the star-three index, pinot only looks the start-tree document, right? It still manages time boundary using time column. So, if i have columns like that;
Copy code
dimensions: sellerId, brand, category, productId, productName

metrics: totalCount

timeFields: eventDate, orderDate
i want to get aggregated results based on orderDate, brand, category, productId for specific seller, category or brand. Some possible queries;
Copy code
select productId, sum(totalCOunt) 
from t 
where sellerId = x and orderDate > Y and orderDate < Z and category = '
group by productId

select category, sum(totalCOunt) 
from t 
where sellerId = x and orderDate > Y and orderDate < Z 
group by category

select brand, sum(totalCOunt) 
from t 
where sellerId = x and orderDate > Y and orderDate < Z 
group by brand
I have to create star-tree index on sellerId, brand, category, productId and orderDate, right? In that case, what happens when i want to get productName?
And also, we can only create one index on a column, right?
k
That’s right. You can create index on a specific column