Apache Pinot #general

Taran Rishit

11/30/2020, 1:46 PM

Hello im unable to start pinot due to the below error- Administrator@EC2AMAZ-6IDI4LG /cygdrive/c/users/Administrator/documents/apache-inot-incubating-0.6.0-bin/apache-pinot-incubating-0.6.0-bin $ bin/pinot-admin.sh StartController -zkAddress localhost:2191 -controlerPort 9000 Unrecognized VM option 'PrintGCDateStamps' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. im stuck at second step in this -> https://docs.pinot.apache.org/basics/getting-started/advanced-pinot-setup can somebody help me?

Amit Chopra

11/30/2020, 2:53 PM

Hi, I saw there is an open ticket for supporting Kinesis (https://github.com/apache/incubator-pinot/issues/5648). Wanted to check when is it slated to be supported?

Joice Jacob

11/30/2020, 2:54 PM

Is there any configuration to increase the LIMIT from 10 to a higher number?

Joice Jacob

11/30/2020, 4:25 PM

I was working on star tree indexing. While loading the data, I got the following issue. Table and schemas are attached with this thread. I was trying to load 30 records. One of the column of star tree index is MSISDN and its cardinality: 10 also TARIFF_PLAN with cardinality: 5

instant_table.json instant_schema.json

error.txt

Graham Plata

11/30/2020, 5:07 PM

Hello all, I am encountering an error when trying to run the sample batch job located here https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot in k8s after following this guide https://docs.pinot.apache.org/basics/data-import/pinot-file-system/import-from-gcp with the pinot-gcs plugin. I am assuming I have a config issue somewhere but could use some expertise.

job.yml controller-run.txt controller-logs.txt

Paul Baumgart

11/30/2020, 6:42 PM

Q: is it possible to query non-integer percentiles? I'm interested in calculating P99.9, but looking at the supported aggregations doc, it looks like those functions only support up to P99.

Ken Krugler

11/30/2020, 8:32 PM

Question about batch import job. When running a LaunchDataIngestionJob, I see the S3-based file(s) being ingested are being copied first to a temp directory on my local machine. Assuming I’ve set up a k8s-based cluster via EKS, is there a way to ingest directly from S3? I see to recall some option to do this, which would be much more efficient.

Joice Jacob

12/01/2020, 6:31 AM

I am working on load test using Jmeter in pinot tables. Currently I am getting an exception related to connection pooling. Response messagejava.sql.SQLException Cannot create PoolableConnectionFactory (null) java.sql.SQLException: Cannot create PoolableConnectionFactory (null) at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:669) ~[commons-dbcp2-2.7.0.jar:2.7.0] at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544) ~[commons-dbcp2-2.7.0.jar:2.7.0] at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753) ~[commons-dbcp2-2.7.0.jar:2.7.0] at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:308) [ApacheJMeter_jdbc.jar:5.3] at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:127) [ApacheJMeter_jdbc.jar:5.3] at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:205) [ApacheJMeter_core.jar:5.3] at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:380) [ApacheJMeter_core.jar:5.3] at java.lang.Thread.run(Unknown Source) [?:1.8.0_271]

Guillaume Loetscher

12/01/2020, 11:02 AM

Hey everyone ! I’m currently investigating Apache Pinot, and after reading a good chunk of the documentation, I have a couple of questions. • If this page, it’s said that if you lose all your controller, your cluster will still be able to answer to read queries (but not write queries, obviously). Then, if a new controller is started, it says that the cluster will recover and will be then available again for write queries. That supposed that all cluster states are stored somewhere. I suppose that “somewhere” is Zookeeper ? • Offline servers are responsible to host segments. Let’s say we have only one replica for a given segment, and the offline server hosting it dies. Will Helix discover that and will ask another offline server to download the same segment, in order to make it available again to the brokers ? • Where can I find some information about the resource requirements (mainly CPU / memory) for controllers / brokers / realtime servers / offline servers ? Thanks for your help !

Sri Surya

12/01/2020, 2:36 PM

tried to execute the pinot start controller cmd It got the following error could you please help me with this?

Mahesh Yeole

12/01/2020, 6:48 PM

<!here> I am trying to run Pinot in Kubernetes but seeing following error. Any suggestions ? helm install -n pinot-quickstart kafka incubator/kafka --set replicas=1 Error: failed to download "incubator/kafka" (hint: running

helm repo update

may help)

Matt

12/02/2020, 10:50 PM

Hi , I have Text index up and running and looks working. However I noticed that some results are not correct for eg:- when I search for

40F916FD-F2A7-2255-FEFB-B43050D8A5EE

. I get results for

81753586-72E1-8DC1-FEFB-08DB16E6A793

40F916FD-F2A7-2255-FEFB-B43050D8A5E

. Trying to understand why it is so. Also If I try to search for XML tags like

</ns1:requestControlID>

it throws an error. Is there any setting I can enable to make these searches work?

Dovydas Sabonis

12/04/2020, 8:55 PM

Hello! does pinot support importing gzipped data? We have gzipped JSON files in GCS bucket - can those be imported directly to pinot or do we have to serve uncompressed files in GCS?

Tan Huynh

12/07/2020, 7:27 PM

Hello, Is there any advice for Pinot schema design? Do I want to create multiple tables, one for each entity and metrics that I want to query, or should I define one big table with all dimensions and metrics?

Kishore G

12/07/2020, 8:57 PM

I'm excited to announce the last Pinot meetup for the year 2020! The Pinot community has grown from 100 to 800 members this year. We want to take this opportunity to thank the entire Pinot community and get your inputs on our 2021 roadmap. In this fireside chat, I will go over all the things we have accomplished together in 2020 and talk about all the fantastic indexing techniques available in Pinot. Afterward, we'll open up for questions and discussions about Pinot and its roadmap. We will share a link tomorrow for everyone to post their questions/topics in advance. We are looking forward to seeing you there! Sign up here - https://www.meetup.com/apache-pinot/events/274700293/

🎉 3

👍 6

🍷 6

Neer Shay

12/08/2020, 6:58 PM

Hi! I'm trying to get a little more information on ThirdEye and how it stacks up compared to Sherlock/Druid so I have a few questions: 1. What is going on behind the scenes? Is there some sort of model running which trains on historical data and learns what an anomaly is? 2. How configurable is this? Can I specify which dimensions/multi-dimensions to run on or does it automatically run on everything? 3. How often does it run? Is this configurable? 4. Where does it store its metadata? Thanks in advance for the assitance!

Matt

12/08/2020, 10:53 PM

Is there any doc detailing comparison of Pinot with ElasticSearch somewhere?

Guillaume Loetscher

12/09/2020, 5:11 PM

Hey folks 👋 Got a quick question about Pinot + docker : the documentation mentioned several time docker images. Are they supposed to be used in production or just for testing purposes ?

Playsted

12/09/2020, 9:37 PM

Are their any guidelines / estimates to resource requirements for Pinot, particularly memory / local node storage needed? Eg. for 10 Tb table in deep storage need X Tb locally at nodes and Y memory for reasonable performance? Is it designed so that the entire table should be cached locally and MMAP'd or can parts go cold and be pulled on demand from deep storage with the extra latency?

Priti Parkar

12/10/2020, 11:55 PM

I want to enable 'pre-aggregation' during realtime ingestion. https://docs.pinot.apache.org/basics/components/table#pre-aggregation 1. Can someone please point to documentation (mainly looking for schema or configs) ? 2. Also where are the pre-aggregated data stored? (in the same table/segment or different)

eunbin lee

12/14/2020, 9:31 AM

Hi, I'm paying attention to Minion's GDPR support. I read the document that the minion framework can be used to achieve the requirements to comply with GDPR. However, the detailed description is "coming soon." I'm confused. Is the ability to use the Minion framework to delete records under certain conditions in the background not yet available? Or is it just the document that hasn't been written yet? In addition, I have some questions about audit, authorization, and DR. 1. Audit at the query level. I need to know not only table config and schema change log, but also who, when, and what queries (including target tables and conditions) were requested. Does Pinot offer audit? Or is it possible to use minion to monitor queries in the background and log them? 2. Is Pinot planning to provide authentication-authorization modules? Druid provides the built-in kerberos authenticator and provides authorization through the ranger extension program. have any similar plans? 3. I want to configure replication between two data centers (not using cloud) Ideally, if data center 1 fails, we want to fail over to data center 2 and fail back when Data Center 1 is normal. Suppose I have configured deep storage (hdfs), pinot cluster on k8s in each of the two data centers. Deep storage replication is possible. But what happens to real-time data? I understand that real-time data stores data in memory and periodically flushes segments to disk. If a cluster down, will real-time data that has not yet been flushed be lost? I'm not sure how to configure DR on pinot. Is there any way to recommend it to me? I'm in the process of getting to know Pinot. Thanks in advance for the help.

Playsted

12/14/2020, 4:04 PM

How are text_match regex's performed? I'm looking for string contains type queries (eg. Text_match(column, '/.*partial_term. */')). Normally I would look for lucene ngram tokenization for this but see Pinot isn't using this. How are the partial term regex's completed? Is this essentially raw regex performed against all tokens?

Dharak Kharod

12/14/2020, 11:00 PM

Hi, while testing offline table ingestion on pinot github i found that the

overwrite

mode is called

refresh

now and got an error while using the

overwrite

as a segment push type. Is the

overwrite

keyword not valid anymore?

Darshan

12/16/2020, 7:27 PM

Thanks, Kishor and co for the fantastic knowledge session :) Would it be possible to share slides, performance stats and design documents? I ask this cos these references helps to bolster our internal notes/documents.

Karin Wolok

12/16/2020, 7:27 PM

This is a great blog post written by @User at Confluera! Very well structured and really walked through their problem / ideas / requirements and challenges. https://medium.com/confluera-engineering/real-time-security-insights-apache-pinot-at-confluera-a6e5f401ff02

👏 7

Slackbot

12/17/2020, 8:28 PM

This message was deleted.

Will Briggs

12/18/2020, 3:22 PM

Has anyone configured realtime Kafka ingest with SASL / jaas auth (as in how Confluent handles auth for their managed clusters)?

dhurandar

12/18/2020, 5:18 PM

Query regarding Apache Pinot, whats the typical OLAP cube size one can host in Apache Pinot, we have a cube which is almost 50 TB m it has some dimensions which very high cardinality but since raw data is more than 5 Petabytes, 50-100 TB is still a reasonable aggregation. We want interactive performance with our OLAP since it would power important Dashboards and drill-downs. So want to know how much data size we can push inside Apache Pinot??

Will Briggs

12/18/2020, 7:15 PM

Essentially, I can’t figure out how to specify a time literal that is compatible with my timestamp column for doing math on it

Amit Chopra

12/18/2020, 7:51 PM

Quick question - has anyone setup AWS Athena with Pinot. Given Athena is essentially Presto underneath