Apache Pinot #general

Yarden Rokach

09/15/2022, 6:05 PM

The Community is Growing💪🏻 3000+ Slack members! Another incredible milestone in the Apache Pinot community's growth! Thanks to all of our Slack members for keeping the ideas coming & conversations flowing! Cheers to more than 3000 members!

🍷 8

❤️ 12

🚀 9

amit mahadik

09/16/2022, 9:53 AM

Hi All, Can someone please suggest an article/tutorial around minions

Huaqiang He

09/19/2022, 1:10 AM

Hi team, regarding the merge rollup task can I 1) specify multiple aggregationTypes on the same metric column, for example, I want sum & min & max (percentile too) for the latency column after rollup. 2) ignore some dimensional columns during and after the rollup/aggregation. We need to ignore some dimensional columns because before rollup the table stores raw signals which all have a uuid column. Retaining the uuid column, there will be no actual rollup.

amit mahadik

09/19/2022, 5:05 PM

Hi All, what is granularity and bucketTimePeriod and are they interlinked?

Corneliu Creanga

09/19/2022, 11:06 PM

Hello, We have a very large kafka topic (it gets about 10-25 mil rows/second) and we would like to use real time ingestion in order to fill in some tables. For each table we have a custom decoder that knows how to extract the proper data from each message or skip the message. I'm curious how the ingestion work - will Pinot stream the data independently or can we have one time ingestion/apply each decoder for the ingested rows? Thanks a lot :)

abhinav wagle

09/19/2022, 11:24 PM

Hello, is Pinot Open source project an ASF project ? https://www.apache.org/licenses/contributor-agreements.html

Chengxuan Wang

09/20/2022, 2:57 AM

hey, wondering if there a way to do TEXT_MATCh search for UUID prefix search. this is what i tried

Copy code

TEXT_MATCH(order_id, '"ae006b22-b5f"')

but it doesn’t return any data. if i tried

TEXT_MATCH(order_id, 'ae006b22-b5f0*')

it returns data more than started with

ae006b22-b5f0

. wondering what is the correct way to do it. thanks.

Prakhar Pande

09/20/2022, 2:29 PM

Hi everyone! I have very recently started exploring Pinot. I am facing a few problems while ingesting data from Kafka topic. 1. when I am ingesting data in a table having only default indexing, totalDocs as shown in the query console is around 123 million. However , when I am ingesting in a table with star tree indexing with the same Kafka topic, the total docs is only 70 million (after I have stopped pushing more data into Kafka ). 2. Is there a way I can delete data from pinot cluster but keep it in deep store? 3. if suddenly my cluster goes down, then from which state deep store will my restore data? Thanks in advance.

Yarden Rokach

09/20/2022, 6:06 PM

9 days left to apply for the StarTree All-Stars program! ⚡📣 Our All-Stars are provided with access to product discussions, and exclusive events, and will be the first to know about any major product developments, features, and updates! They also are provided with limited-edition Pinot and StarTree swag! 👕 😎 Check out the full program>> Not sure if you should apply? send me a message to discuss it!

Sukesh Boggavarapu

09/23/2022, 9:37 PM

Can pinot infer partitioning field based on input path? Like if I have s3 path like "s3://my-bucket/dt=2022-09-01" , and I have a table with

dt

in schema (but my actual data in s3 doesn't contain

dt

column), if I run an ingestion job through spark , can it infer that the partition is

dt=2022-09-01

and creates a partition on that and also populate the

dt

value?

deepuak01

09/24/2022, 3:58 PM

👋 Hi everyone! I am new to Apache Pinot and is looking to use Pinot as an OLAP datastore for my organization

🦜 1

deepuak01

09/24/2022, 3:59 PM

Can anyone suggest a link or an online document describing how to set up apache pinot in AWS?

Ehsan Irshad

09/27/2022, 8:54 AM

Hi folks, I dont see a channel for spark-pinot connector. Was wondering if its in plan to support the connector with Spark 3 in upcoming releases?

coco

09/27/2022, 11:11 AM

How does Pino's partitioning work when partitions increase in Kafka topics? Is there any problem? 'stream ingestion with upsert' https://docs.pinot.apache.org/basics/data-import/upsert#use-strictreplicagroup-for-routing 'routing partitioning' https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#partitioning

Alex

09/27/2022, 1:56 PM

hi everyone! does anyone know of any good sliced and dice UI for Pinot? I’m thinking something like Imply (Pivot before) for Druid. Should be opensourced. The idea -> give analysts an easy way to look at a single table (drill down, slice, …) without any SQL

Machhindra

09/27/2022, 6:09 PM

Hi everyone! I am trying to store the ‘metrics’ in timeseries format into pinot real-time table. I am not sure how to design the table config to transform the incoming json from the kafka topic to Pinot table as shown in the picture. Basically, I need to match the ‘label-name’ to pinot columns and insert ‘label-value’ to column value from a json array. I would have put entire labels into a single column but I want to allow user to query like “select … from mytable where ZosSystem=‘Blah’“.

Yarden Rokach

09/28/2022, 10:52 AM

#RTASummit-is happening TODAY! Make sure to register, and join us for 3 hours of deep dive into top tier companies’ use cases , data flows, and real time analytics! Jay Kreps (CEO of Confluent) will be there, will you? https://www.linkedin.com/posts/startreedata_trailer-for-jay-kreps-confluent-at-real-t[…]316037263360-Zqw3?utm_source=share&utm_medium=member_desktop

🔥 3

Tim Berglund

09/28/2022, 3:09 PM

Yes! Today!

Tim Berglund

09/28/2022, 3:10 PM

rtasummit.com. Do what must be done. See you in 50 minutes. 🙂

🔥 4

Tim Berglund

09/28/2022, 3:42 PM

A super-secret view of the StarTree sudios, where the Real-Time Analytics Summit is being broadcast.

🍷 11

Yarden Rokach

09/28/2022, 6:46 PM

Last call to submit your nomination for the StarTree All Stars! The nomination will be closing tomorrow. 🌟 https://community.startree.ai/all-stars

Karin Wolok

09/29/2022, 3:30 PM

Just to add to what Yarden posted above - There will be no submission considerations for All Stars 2023 after this day, so please submit ASAP if you haven't already!

Edgaras Kryževičius

09/29/2022, 3:37 PM

Hey! When I run spark ingestion job on local system, I can see that pinot-plugins-dir-x (where x is int) directiories are being created. What is it? Where would it be created if I ran spark-submit job on k8s? Would it create on executor pod?

Jinny Cho

10/04/2022, 2:17 PM

👋 Can I ask one question? I'm looking into making Zookeeper more resilient. How would you prepare in case of all of the zookeepers are down? I'm considering some kind of backup for Zookeeper and curious if there's any recommendation especially for zookeepers in Pinot environment.

Ashish Kumar

10/05/2022, 12:39 PM

Hi Team, I was looking at pinot go client https://docs.pinot.apache.org/users/clients/golang seems like it connects via zookeeper path.. wondering if it's a good practice from security perspective? because it 'll require to expose zookeeper to clients?

piby

10/06/2022, 11:14 AM

Hi, I am just exploring this project and have a question on pinot-s3 data ingestion. At our company we have new data coming as json/csv files every minute/hour. We are currently using postgres which is hard to scale so we are looking for a performant, horizontally scalable OLAP solution ideally which runs on Kubernetes. My question is if it is possible to sync a S3 bucket with pinot? So, if we add new csv/json files to the bucket, pinot should automatically injest (only) new files into its segment store without any duplicates. I expect this is doable using S3 events but I couldn’t find if something like this is already in place. If not, then we have to cook up out own solution using S3 events or set up a kafka cluster to stream data to Pinot. Thanks!

Lab Nems

10/06/2022, 10:36 PM

Hi, I started working with pinot in my tests I want to connect tableau server to pinot via JDBC but only I encounter a difficulty. The connection to pinot is established very well I can see the pinot tables but only I cannot see the contents of the tables and I have no error in the pinot logs. I encounter this problem only with the containerized version of pinot. Please is there an option to set to connect to pinot with JDBC when pinot is running under docker? Thanks

Steven Hall

10/06/2022, 11:51 PM

Hi Team First, this is a cool project. Excited to be looking into it and learning more. Noobie question… I have looked at the architecture and I see some older docs that show the the controller consists of two components: Zookeeper and Helix. If we choose a Kubernetes deployment it seems Kubernetes does the same things that Helix does. Am I correct in assuming the Kubernetes deployment does not include Helix?

Matthew Kerian

10/07/2022, 8:08 PM

Hello. We were wondering what’s the preferred way for creating tables/schema. Is there any reason not to just use the web page?

Michael Latta

10/08/2022, 6:36 AM

Is it possible and a good idea to use offline tables and create segments directly in flink, or better to write the data from flink to Kafka and use a real time table? We generate the data in flink but given the size writing directly to the segment store might have advantages. We could use a short-ish retention period as well.