quick question: Does pinot support custom aggregat...
# general
b
quick question: Does pinot support custom aggregation functions as UDFs? For example, when I'm aggregating results of a group, can I pick the latest record by timestamp?
x
so you want to pick the last msg per group or you want to do aggregation on that
the first part is already an aggregation function technically
pinot doesn’t support aggregation on top of aggregation for now
b
last msg per group as the selection.
Example: SUM(col) is summing up all values in the group, LATEST(col) is just picking the latest value in that group
yes, LATEST is an aggregation. There is no aggregation of aggregations here
x
right, then query should be something like
SELECT last(ordered_col, columns...) FROM myTable group by a
b
I'm thinking like this:
select id, latest(metric) FROM myTable group by id
latest is an aggregation on only one column, i'm not thinking across columns, if that's what you're thinking
x
right, i was just thinking how to order those events
based on value or timestamp
especially how to merge between segments
b
hmm... aren't these values already ordered based on timestamp in the raw segment?
@Xiang Fu what are the classes or modules in the source code to look at? I think
latest()
aggregation will be pretty useful and will unlock a bunch of use cases, at least for our open source project Hypertrace, where we are using Pinot.
x
it's ordered in one segment, but we need to merge from multiple segments and from multiple servers 🙂
I think we need to carry over the timestamp with it
you can check
org.apache.pinot.core.query.aggregation.function.AvgAggregationFunction
It has the definition of intermediate results
and the final merge
b
Thanks