Question about hierarchical aggregations. In Elast...
# general
k
Question about hierarchical aggregations. In Elasticsearch we can (for example) group by day, and then sub-group in each day by some attribute, and sum a metric, and get the top 10 results (for that sum of that metric) per day. It doesn’t seem possible to do this with Pinot, but wanted to confirm, as my SQL skills are pretty rusty, thanks!
k
order by within a sub group is not possible in Pinot. If you need this, it requires writing a special aggregation function
Do you have the elastic search query handy?
k
Checking…
Here’s one that buckets by gender, sub-buckets by state, aggregate medicare payments. I’d have to poke around more to find one that limits the states to the top N.
Copy code
json{
“size”:0, “aggs”: { “by_gender”: { “terms”: { “field”: “nppes_provider_gender”, “size”: 20 }, “aggs”: { “by_state”: { “terms”: { “field”: “nppes_provider_state”, “size”: 20 }, “aggs”: { “total_payment”: { “sum”: { “field”: “medicare_payment_amt” } } } } } } } }
And as you might be able to tell, the JSON representation for aggregation queries in Elasticsearch is… not so clear.
k
reading
k
I took a quick look at https://docs.pinot.apache.org/developers/developers-and-contributors/extending-pinot/custom-aggregation-function. So assuming I implemented a custom function that supported hierarchical grouping with a limit on sub-buckets, what would the PQL query look like?
select sum(metric) from mytable group by mycustomfunction(gender, state, <max states returned>)
?
k
it would be more like this
select gender, myCustomFunction(metric, state, <max_states>) from mytable group by gender
but in sql language this is needs over (partition by) support which is not supported in Pinot today
can you file an github issue, this is an useful feature
k
An issue for limiting a sub-group, or support for OVER?
And when you say “in sql language…” do you mean it could work with PQL, or it’s a no-go currently no matter what?
@Kishore G - heading out for a bit, many thanks for the useful info. I’ll follow up with a GitHub issue once I hear back, thanks again!
k
issue for limiting sub-group
k
Got it. So to wrap up, is there currently any way to implement what I need (via a custom function), without adding OVER() support to Pinot?
k
yes, you can write a custom aggregation function
k
OK, now I understand that when you said “but in sql language this is needs over…“. You were talking about a non-UDF approach. Cool, thanks again, I’ll file that issue about limiting sub-group.
k
will be good to add a generic UDF here which takes a column(s) and another aggregation function
k
So something like
subGroup(function, limit, columns…)
?
k
yes
k
Any existing code that remotely resembles this?
k
you can see AvgAggregationFunction (simple) or ThetaSketch - a more complex version which might resemble the multi argument
Copy code
DistinctCountThetaSketchAggregationFunction