Apache Pinot #general

Ignacy Krasicki

03/21/2020, 3:32 PM

thanks, this might be useful in many usecases (it seems druid also introduced "lastInt" and "lastString" aggregations.) second question is regarding encryption at rest. we are currently using RDBMS with transparent encryption and our requirement would be similar functionality. I am more familiar with druid and it has cold storage (which can be encypted as s3 and hdfs) and local cache which is not encrypted, so it does not meet our requirements. in pinot i found pinotencrypter, but my question is - can everything that is written to disk be encrypted in pinot?

Kishore G

03/21/2020, 3:36 PM

Pinot also decrypts it when the segment gets to local disk. Performance will be bad, if we decrypt it on the fly for each query.

Kishore G

03/21/2020, 3:38 PM

If you want to maintain a separate view for last point, you can do that in Pinot

Dan Hill

03/22/2020, 8:55 PM

I'm working on a system that uses Presto to query Pinot. I saw there is a gitbook page for Presto integration. Presto (prestodb's version) has a Presto integration built in. Is there a difference in integrations between the two approaches?

Dan Hill

03/22/2020, 8:56 PM

Also, when I run a query using Presto, I can only aggregate one metric at a time. I filed a bug against prestodb.

Dan Hill

03/22/2020, 8:56 PM

https://github.com/prestodb/presto/issues/14277

Kishore G

03/22/2020, 8:59 PM

that's because its using older version of Pinot, there is a property allowMultipleAggregations in presto-pinot-connector config. Its false by default, you can set it to true. @User will need your help to move to new pinot sql api here that allows multiple aggregations

Dan Hill

03/22/2020, 8:59 PM

Ah, okay. Should I follow these instructions?

Dan Hill

03/22/2020, 8:59 PM

https://apache-pinot.gitbook.io/apache-pinot-cookbook/integrations/presto

Dan Hill

03/22/2020, 9:10 PM

Cool, I found the property.

Dan Hill

03/22/2020, 9:10 PM

https://github.com/prestodb/presto/blob/a3f9aa3566675f4b5fea33a96abc58fddbf56a21/presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotConfig.java

Neha Pawar

03/26/2020, 3:42 AM

would really appreciate if you can watch it, follow along and try it out

Kishore G

03/27/2020, 5:11 PM

any experts on readthedocs here?

Kishore G

03/27/2020, 5:12 PM

we need help adding a banner to old docs https://readthedocs.org/projects/pinot/ and add a reference to new docs https://apache-pinot.gitbook.io/apache-pinot-docs/

lsabi

03/27/2020, 9:10 PM

What about copying it from this docs?

lsabi

03/27/2020, 9:10 PM

https://omnia-docs-g2.readthedocs.io/en/latest/blocks/banner/

lsabi

03/27/2020, 9:10 PM

Source code

lsabi

03/27/2020, 9:10 PM

https://raw.githubusercontent.com/preciofishbone/OmniaDocsG2/master/blocks/banner/index.rst

Xiang Fu

03/28/2020, 12:18 AM

<!here> Hello community, We are pleased to announce that Apache Pinot (incubating) 0.3.0 is released! Apache Pinot (incubating) is a distributed columnar storage engine that can ingest data in realtime and serve analytical queries at low latency. The release can be downloaded at: https://pinot.apache.org/download The release note is available at: https://docs.pinot.apache.org/releases/0.3.0 Additional resources - Project website: https://pinot.apache.org Getting started: https://docs.pinot.apache.org/getting-started Mailing list: dev@pinot.apache.org Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot Twitter: https://twitter.com/ApachePinot Best Regards, Apache Pinot (incubating) Team

🎉 12

👍 9

Joey Pereira

03/30/2020, 8:52 PM

👋 I had a random question about the query approximation, mentioned on https://pinot.readthedocs.io/en/latest/pql_examples.html

Results of aggregations with large amounts of group keys (>1M) are approximated

I wasn't able to find any other details about the approximations referenced in docs, code, or issues. Is there somewhere I can read up on further details about the approximations?

Kishore G

03/30/2020, 9:01 PM

I am editing the docs to add more details. But here is the gist • In every node, we keep a max limit on the hashmap <GroupByKey, Metric> for group By • When we hit this limit, new keys will be dropped but for existing keys Metric will be updated

Kishore G

03/30/2020, 9:02 PM

this is for group by without ordering

Sidd

03/30/2020, 9:21 PM

@User I had recently put together this explanation for another similar question. I hope this will help as well.

Copy code

GROUP BY execution happens in 3 stages:

(1) At each Pinot server, we execute the query on a segment -- here by default we don't consider more than 100k unique groups as an attempt to restrict memory usage and prevent OOMs.

(2) At each Pinot server, we combine/merge the results from multiple segments -- this is where we make a best effort at ensuring accuracy by returning max (5*topN, 5000) number of unique groups from each server to the broker.

(3) Reduce the results from all servers at the broker, sort them, return TOP N

By the time server level merge begins in (2), it is very likely that some groups were not considered because of two reasons:
-- They came later in the scan while the records were being iterated upon and we had already exhausted the limit of 100k per segment
-- Step 2 is multi-threaded where there are multiple threads (each handling one or more segment) combining the results across all segments into a single data structure. Here what makes into the list is dependent on the execution order/scheduling of threads.

Joey Pereira

03/30/2020, 10:26 PM

Ah, thanks for the clarification! At first my concern was about accuracy based on cardinality of keys pre-aggregate, but that makes a lot more sense (:

Dan Hill

04/01/2020, 3:33 AM

Does Pinot's PQL support

having

clauses? When I enter a query with a having clause into the web Pinot Data Explorer, it seems like having is ignored and the query is allowed.

Copy code

select platform_id, sum(cost_usd_micros) from events_testing where platform_id = 1 group by platform_id having (sum(cost_usd_micros) < 10000)

Dan Hill

04/01/2020, 3:34 AM

Removing having does not impact the results. I'll still have rows that do not match the having clause.

Mayank

04/01/2020, 3:34 AM

Not at the moment, we have a plan to add that support

Dan Hill

04/01/2020, 3:34 AM

Ah, okay.

Dan Hill

04/01/2020, 3:36 AM

Yea, that seems very useful for my use case. I'd also want Presto to be able to forward the having clause to Pinot.

Mayank

04/01/2020, 3:42 AM

Ack