Ignacy Krasicki
03/21/2020, 3:32 PMKishore G
Kishore G
Dan Hill
03/22/2020, 8:55 PMDan Hill
03/22/2020, 8:56 PMDan Hill
03/22/2020, 8:56 PMKishore G
Dan Hill
03/22/2020, 8:59 PMDan Hill
03/22/2020, 8:59 PMDan Hill
03/22/2020, 9:10 PMNeha Pawar
Kishore G
Kishore G
lsabi
03/27/2020, 9:10 PMlsabi
03/27/2020, 9:10 PMlsabi
03/27/2020, 9:10 PMlsabi
03/27/2020, 9:10 PMXiang Fu
Joey Pereira
03/30/2020, 8:52 PMResults of aggregations with large amounts of group keys (>1M) are approximatedI wasn't able to find any other details about the approximations referenced in docs, code, or issues. Is there somewhere I can read up on further details about the approximations?
Kishore G
Kishore G
Sidd
03/30/2020, 9:21 PMGROUP BY execution happens in 3 stages:
(1) At each Pinot server, we execute the query on a segment -- here by default we don't consider more than 100k unique groups as an attempt to restrict memory usage and prevent OOMs.
(2) At each Pinot server, we combine/merge the results from multiple segments -- this is where we make a best effort at ensuring accuracy by returning max (5*topN, 5000) number of unique groups from each server to the broker.
(3) Reduce the results from all servers at the broker, sort them, return TOP N
By the time server level merge begins in (2), it is very likely that some groups were not considered because of two reasons:
-- They came later in the scan while the records were being iterated upon and we had already exhausted the limit of 100k per segment
-- Step 2 is multi-threaded where there are multiple threads (each handling one or more segment) combining the results across all segments into a single data structure. Here what makes into the list is dependent on the execution order/scheduling of threads.
Joey Pereira
03/30/2020, 10:26 PMDan Hill
04/01/2020, 3:33 AMhaving
clauses? When I enter a query with a having clause into the web Pinot Data Explorer, it seems like having is ignored and the query is allowed.
select platform_id, sum(cost_usd_micros) from events_testing where platform_id = 1 group by platform_id having (sum(cost_usd_micros) < 10000)
Dan Hill
04/01/2020, 3:34 AMMayank
Dan Hill
04/01/2020, 3:34 AMDan Hill
04/01/2020, 3:36 AMMayank