also what is the time bucket for which aggregation happens?
01/04/2022, 11:41 PM
The bucketing right now is the time granularity. In other words, the dimension/time values of rows have to match, for metrics to aggregated.
01/04/2022, 11:46 PM
i see. what i am thinking of doing, is aggregate metrics per 5 min time bucket. now i can create a derived column which will mark the start of a 5 min window using event timestamp. but the event timestamp is still in the row. i am guessing that will be a problem for this to work right? unless of course i manage that outside of pinot.
01/05/2022, 12:41 AM
i’ve experimented with this a bit. you can use a transform function in the table config to make derived columns with whatever granularities you want. but then you have to make sure to pick the lowest granularity as the time column and not include the original column in your schema
01/05/2022, 1:01 AM
wait, so i don't have to keep the original column in the schema when using a derived column? i thought i needed to define both for the transform function in the table config to work.
01/05/2022, 1:14 AM
my personal experiments used an event source that had a millisecond granularity time column. I used 2 transform functions to convert it to hourly and daily. I did not include the original field in the schema and it worked correctly.
when I kept the original column, it also tried to preaggregate on the millisecond column which defeated the purpose