Hello Is there any performance difference between the follow Apache Pinot #troubleshooting

Hello, Is there any performance difference between...

Yash Agarwal

03/17/2021, 4:29 PM

Hello, Is there any performance difference between the following two queries for pinot

select distinct city from transactions limit 100000

select city from transactions group by city limit 100000

Ken Krugler

03/17/2021, 4:38 PM

My guess was no, as implementation-wise it’s a similar operation. But just for grins I tried it on a large dataset (1.7b records) and got similar performance. I’d guess that memory usage would also be similar.

Mayank

03/17/2021, 4:39 PM

Hey distinct and group by use different engines internally, even though semantically they mean the same thing and might end up doing similar amount of work.

Ken Krugler

03/18/2021, 3:07 AM

Just FYI I wound up getting the same first 10 results from both queries, which is why I thought the underlying implementation was the same, since there’s no ordering for the results. But based on what @Mayank said that’s not the case.

Mayank

03/18/2021, 3:07 AM

Yeah, the Operators for group-by and Distinct are implemented separately

Mayank

03/18/2021, 3:10 AM

@Ken Krugler for the second query, did you run

select city

select count(*)

Ken Krugler

03/18/2021, 3:10 AM

select city

Though with my data set, so actually

select advertiser

Mayank

03/18/2021, 3:11 AM

Ok, my comments assumed the second query was aggregation group-by (didn't carefully see the second query).

Mayank

03/18/2021, 3:12 AM

I don't recall that we re-write the second query as a distinct internally, but I can check that

Mayank

03/18/2021, 3:18 AM

@Ken Krugler @Yash Agarwal I stand corrected. The Calcite parser re-writes the second query as distinct. In my previous comments, I thought the second query was a aggr-group-by query.

👍 1

Ken Krugler

03/18/2021, 3:19 AM

Thanks for checking, that explains why the results were so similar 🙂

Mayank

03/18/2021, 3:19 AM

Yep, that is what got me thinking as well.

Open in Slack

Previous Next