Apache Pinot #general

Hello 👋 I am not sure if this is the right place to ask: Does anyone know if startree.ai managed solution comes with Presto (to enable full sql syntax)? is there a pricing calculator somewhere?

abhinav wagle

08/31/2022, 5:40 PM

Hello, do we have an example on how to create/manage tenants using Kubernetes Pods via helm or other option https://docs.pinot.apache.org/basics/components/tenant#server-tenant

coco

09/01/2022, 10:10 AM

Hi Pinot Team. We are running a performance test. The latency of slow queries (2s to 7s) is much higher than the average response time (62.5ms). The graph looks like a warm-up. When a large number of requests are sent, the latency of the initial requests of the test is high, and the subsequent requests show an average response time. Here are the two Pino-Server query logs. I have noticed that the pinot-server is causing delays.

Processed requestId=5111,table=table_poc_OFFLINE,segments(queried/processed/matched/consuming)=785/419/193/-1,...totalExecMS=4113...numDocsScanned=229,...,scanPostFilter=1832

Processed requestId=34700,table=table_poc_OFFLINE,segments(queried/processed/matched/consuming)=784/432/163/-1,...totalExecMS=141...numDocsScanned=177,...,scanPostFilter=1416

Is there any way to reduce this high latency difference? The table was created with a star-tree index(4 dimension columns, SUM__). 159 dimension columns, 47 metrics columns, 2 date-time columns 3.2T (3 replicas) 2,354 segments 834,221,210 docs 3 pinot-server, heap 16G 3 pinot-broker, heap 8G (the same hardware as pinot-server)

Tiger Zhao

09/01/2022, 6:06 PM

Hi, just wondering, when querying aggregations on a realtime table with data streaming in, is it guaranteed that every aggregation in one query will be computed on the same view? (so new rows that get ingested while the query is running are not included?)

Ankit Sultana

09/01/2022, 11:54 PM

Saw this warning on star-tree index wiki page: https://docs.pinot.apache.org/basics/indexing/star-tree-index Can someone share the corresponding Issue/PR?

Mohit Garg4628

09/03/2022, 6:28 AM

Hi, I was exploring Apache Pinot v/s Apache Druid for one of the my use cases. Can you please help to find which one is better? Thanks

John Peter S

09/05/2022, 3:05 AM

Hi Team, Considering I am using

replicaGroupStrategyConfig

to use

Partitioned Replica-Group Segment Assignment

and I give a column and number of instances per partition, I have two questions here: 1. What is the method used for doing this partition? 2. If I am partitioning based on a column and I want to partition a particular value of the column separately how can this be achieved?

Peter Pringle

09/05/2022, 6:42 AM

For 0.11 rc is there a binary download link?

Rangesh Gupta

09/06/2022, 2:47 AM

Hi Team, Working on the complex query and need help in finding the best way to achieve it. The DB table has a time range based usage record. We have to create the time series usage. For Example: Sample Table: BEG_DT END_DT REGION 2020-01-01 2021-06-09 region_a 2020-06-29 2021-06-09 region_a 2020-01-01 2020-06-29 region_a 2020-01-01 2021-06-09 region_b 2020-01-01 2021-06-09 region_b 2020-01-01 2021-06-09 region_a 2020-01-01 2021-06-09 region_a 2020-07-08 2021-06-09 region_a 2020-01-01 2020-07-08 region_a 2021-05-10 2021-06-09 region_a 2020-01-01 2021-05-10 region_a 2020-01-01 2021-06-09 region_a ...... 2020-01-01 2021-06-09 region_a Result: Date: Active count: 2020-01-01 9000 2020-01-02 8940 ...... 2021-06-09 8067 What is the best way to write such query? Solution1: Do a join query. But it is will very resource intensive. Solution 2: Do application level processing and run select query in for loop for all the time series. Which solution is better 1 or 2? Or there is better way we can achieve this.

Lars-Kristian Svenøy

09/06/2022, 9:00 AM

Hello team 👋 Is there a planned date for release 0.11?

Karin Wolok

09/06/2022, 12:58 PM

CC: @Yarden Rokach 💪 Great blog! @Subbu Subramaniam @Sajjad Moradi https://medium.com/apache-pinot-developer-blog/pause-stream-consumption-on-apache-pinot-772a971ef403

🍷 8

🤩 3

🔥 6

Mithun Vigneswar Gunasekaran

09/07/2022, 4:28 AM

Hi, We have a use case where we want to setup a hybrid table with upsert support for the realtime table of the hybrid setup. When i try to setup a RealtimeToOfflineTask, I get the following error:

Invalid table config: table_REALTIME with error: RealtimeToOfflineTask doesn't support upsert table!

. Any recommendations on how to go about this setup? Our data is more like this - we will have update use cases for data for current and future quarters while earlier quarters' data does not require upsert support. So considering moving the older quarters' data to the offline table while still supporting update for the previous data. cc: @Mayank

Xiang Fu

09/08/2022, 1:11 AM

Hello Community, We are pleased to announce that Apache Pinot 0.11.0 is released! Apache Pinot is a realtime distributed OLAP datastore, designed to answer OLAP queries with low latency use-cases. The release can be downloaded at https://pinot.apache.org/download The release note is available at https://docs.pinot.apache.org/basics/releases/0.11.0 Additional resources - Project website: https://pinot.apache.org Getting started: https://docs.pinot.apache.org/getting-started Pinot developer blogs: https://medium.com/apache-pinot-developer-blog What is Apache Pinot? (and User-Facing Analytics) Video:

https://www.youtube.com/watch?v=_lqdfq2c9cQ▾

Intro to Pinot Video:

https://www.youtube.com/watch?v=T70jTTYhYyM▾

Join Pinot Community - Twitter: https://twitter.com/ApachePinot Meetup: https://www.meetup.com/apache-pinot/ Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot Best Regards, Apache Pinot Team

❤️ 8

🙏 6

🍷 8

🎉 4

Rangesh Gupta

09/08/2022, 3:26 AM

Hello Team, Want to know about the more UDF Groovy security vulnerability. In documentation it is just mentioed that "*Allowing execuatable Groovy in queries can be a security vulnerability."* What are the security vulnerability? any safe way to use groovy based UDF? Thanks

Peter Pringle

09/08/2022, 8:31 AM

Is the v2 query engine in release 0.11 out of beta?

Peter Pringle

09/08/2022, 8:32 AM

Also what is this cluster config, controller, server, broker or something else?

Peter Pringle

09/08/2022, 8:32 AM

Please add the following configurations to your cluster config:

• • •

"pinot.multistage.engine.enabled": "true",

• •

"pinot.server.instance.currentDataTableVersion": "4",

• •

"pinot.query.server.port": "8421",

• •

"pinot.query.runner.port": "8442"

Karin Wolok

09/08/2022, 10:37 AM

Posting this in general, hope it's ok! (sometimes jobs channel doesn't get as much traffic) We're hiring at StarTree for a Developer Advocate (amongst a bunch of roles). Developer Advocate is someone who is passionate about Apache Pinot, Real Time Analytics, and loves to teach / train. Speaking at conferences, writing blog posts, creating content, etc. If you're interested, apply online and/or ping me 🙂 https://www.startree.ai/careers

Yarden Rokach

09/08/2022, 2:41 PM

The nominations are now open for the StarTree All-Stars Class of 2023! ⚡📣 StarTree All-Stars are individuals that go above and beyond; contributing extensively to the Apache Pinot and StarTree communities through knowledge sharing, advocacy, and technical support. ♾️ Our All-Stars are provided with access to product discussions, and exclusive events, and will be the first to know about any major product developments, features, and updates! They also are provided with limited-edition Pinot and StarTree swag! 👕 😎 Do you have what it takes? Apply and learn more here>> Please feel free to message me with any questions, or wonders.. would be happy to discuss! <3

abhinav wagle

09/08/2022, 5:43 PM

Hellos, do we have doc which goes into details on how to monitor Pinot metrics using dataDog via helm installation ?

Atri Sharma

09/09/2022, 10:36 AM

@Eaugene Thomas Can you please ask your queries around TLS, here?

👍 1

Eaugene Thomas

09/09/2022, 10:38 AM

Hi , I was working an a POC for using Encryption in transit in Pinot . In my case the pinot nodes are distributed across system , So if say I want to use self signed certificates for TLS , how does that work with Pinot ? I got some answers in https://stackoverflow.com/questions/2893819/accept-servers-self-signed-ssl-certificate-in-java-client which says to modify the trust manager , is there any other alternate options for accepting self signed certificates between pinot nodes ? re : https://apache-pinot.slack.com/archives/C01H1S9J5BJ/p1662539698828649

Tiger Zhao

09/09/2022, 2:35 PM

Hi, I'm looking to update the stream configs of an existing table. After updating the configs, would reloading the segments cause the config to update for existing (and future) consuming segments?

Stuart Coleman

09/09/2022, 4:11 PM

hey, we have a use case which consists of events emitted over time where we have two dimension columns. One is cardinality of approx 10 million and one is of cardinality of approx 100. The business use case is to compute aggregates (min, max, sum, count) against this data filtered by specific values of the cardinality 100 column and either grouped by or selecting a specific value of the order 10 million cardinality column. There is also a filter over time range, so applying the above filters in addition to a filter on the timestamp column which restricts the aggregates to the last day/month/quarter/year. This feels like something that would be well served by the startree index, but i'm struggling to understand how that interacts with the time filtering aspect of the query. The examples in the docs for startree all seem to refer to the whole table without a time filter. Does anyone have any tips on best indexing strategy for this?

Kumar Ashish

09/10/2022, 6:16 AM

Hi all, we were exploring pinot for a use-case where we can ingest data from multiple streams. Any workarounds/extensions to achieve the same.

aviv e

09/13/2022, 3:56 PM

Hi, A question regarding the JSON column type, How is it treated inside the segment? Is it exactly like string?

Doris Zhang

09/13/2022, 5:44 PM

Hi, does Pinot support data encryption at rest? It seems we support encryption at rest for deepstore on amazon S3, how about using hdfs for deepstore? And do we have support on data encryption at rest for segments on the server? Thanks!

Saksham Gupta

09/14/2022, 12:41 PM

Hi, We are trying to upsert into table using pinot, but as our table needs data to be ingested from multiple Kafka source (basically combination of multiple source tables from MySQL databases) Also as these pinot does not support reading from multiple Kafka for a same table we are trying if we can push all data as json to a single Kafka topic (having data of different source tables as different messages/events) So can do a partial upsert but using condition like if some specific value is there at source then only update it and update specific columns in pinot based on these different condition if we don't have any such support in pinot do we have a workaround where i don't require a join but using this multiple events can update rows of table parallel

👋 2

RachelP

09/14/2022, 10:49 PM

Hey you! Yeah.. you... Pinot user! Are you also a Presto user? If so, I am sure you have an excellent story to tell about what you are doing with the best data projects on the planet, Pinot and Presto! How about sharing it with the world?? The CFP is now OPEN for PrestoCon 2022! This year, live and in-person in Mountain View. Hit me up with any questions!! https://events.linuxfoundation.org/prestocon/program/cfp/

🤟 2