Apache Pinot #general

Ken Krugler

06/02/2022, 7:29 PM

General question for people using Pinot in production - does anyone depend on Scala 2.11 support? Asking because other open source projects have dropped 2.11 (I think last patch release was in 2017), and that’s making it harder for Pinot to stay current with updates.

Diogo Baeder

06/02/2022, 9:46 PM

Hey guys! Which pipeline/workflow system have you guys been using for managing your Pinot data, in case you're happy with the pipeline system? I'd like to evaluate a few different options to check which ones would suite me best. So far I only tried Airflow and, I'll be honest, didn't like it that much. So I'm looking at possible alternatives.

Fritz

06/03/2022, 7:00 AM

Hi pinot community, Does pinot is a good case for exporting detailed data report use case? The report would have some level of aggreation but the granularity of dimensions is still high? Does this kind of use case still fit with pinot? Thanks

Sowmya Gowda

06/03/2022, 12:42 PM

Hi Team, I have a scenario to load my local files in a particular folder to pinot offline table. Suppose that, files will increase for every one hour or so. How do I create a segments for those files in timely basis for every hour ? Is there any automatic process for creating segments for hour or so ?

➕ 1

Facundo Bianco

06/03/2022, 4:30 PM

Hi Pinot Team, do you know if talk about Pinot & Trino real-time analytics was recorded? (and where I can find it). It was at Trino Submit (Oct 22). Thank you.

Vuppala Suresh Kumar

06/06/2022, 7:37 AM

Hi Pinot Team, Do you know how to create, rename, drop column name of existing table. It is tedious task always to drop the existed table and to create new table for altering the column names.

Vuppala Suresh Kumar

06/06/2022, 10:08 AM

Hi Pinot community, Can we copy data of one table to new table?

Alex Gartner

06/07/2022, 2:34 PM

Does anyone have any thoughts on Pinot/Superset versus ElasticSearch/Kibana? Anyone have experience operating the two at scale, or can offer their pros/cons?

Vibhor Jaiswal

06/07/2022, 4:05 PM

We have a particular use case where we wanted to ingest some data to Pinot in realtime tables using kafka . However we wanted to get the batch id and count from the Kafka and guarantee that the count matches the number of records in the pinot tables If it matches , then we want to updated the kafka with a flag saying the given batch id is processed . Is there a smart way to do this ? We have a lame way of running another flink job to do the count and send message to kafka but if their is a better idea, we will welcome it .

Young Seok (Tony) Kim

06/07/2022, 9:22 PM

(I’ve asked this question in Github Issues here, but I am forwarding the question here since it would be more appropriate.) Hi Pinot team, I have a question regarding star-tree indexing a JSON key. For example, in the following

tableIndexConfig

Copy code

"tableIndexConfig": {
  "starTreeIndexConfigs": [{
    "dimensionsSplitOrder": [
      "Country",
      "Browser",
      "Locale"
    ],
    "skipStarNodeCreationForDimensions": [
    ],
    "functionColumnPairs": [
      "SUM__Impressions"
    ],
    "maxLeafRecords": 1
  }],
  ...
}

Is there a way to specify a JSON key column within the

dimensionSplitOrder

? For example, if we have a column named

person

having a JSON value such as

{ "number" : 112, "street" : "main st", "country" : "us" }

, can we add

"country"

as one of the

dimensionsSplitOrder

Bin Wang

06/08/2022, 4:50 AM

Hey everyone, might I ask what's the best practice if I want to connect my database to Pinot?

Bin Wang

06/08/2022, 4:56 AM

From the documentation, the data is mainly come from the offline batch ingestion spark/hadoop or streaming w/kafka. What if my data is persisted in a database? Should I trigger a kafka event for any database write, and send this to the pinot to consume? Also pinot can read the daily dump snapshot from database.

Vuppala Suresh Kumar

06/08/2022, 11:11 AM

Hi Pinot community, Can we transfer/copy data values from one column to other column of same datatype(in same table)?

Vuppala Suresh Kumar

06/08/2022, 11:26 AM

How to replace values of mapped data, let say change all 'ROAD' values to 'TRUCK' for all values in a column?

Thierry

06/08/2022, 3:10 PM

Hello Pinot community, I am at my beginnings with apache pinot which I find very interesting. I did a scan of the apachepinot/pinot:0.10.0 image with the grype tool(grype apachepinot/pinot:0.10.0 | grep Critical) and I have quite a few vulnerabilities at Critical level. Please how do you proceed to mitigate or eliminate these vulnerabilities. Thank you.

Diogo Baeder

06/09/2022, 12:11 AM

Hi guys, let me ask a question about Replica-Group Instance Assignment, I'd like to better understand how it works (more on this thread).

Diogo Baeder

06/09/2022, 12:18 AM

Completely changing subjects: these days I asked here about data pipeline solutions to pair with Pinot, and I've been using Argo Workflows. Not in production yet, only locally in my computer - using k3s as a "tiny k8s cluster" alternative -, but it's been working amazingly well, I really like it! I've been using it to do a number of steps in our ETL, one of them being the Pinot segment generation and pushing, and it's getting really smooth! I'll write up more about this in the future, when I have something more concrete.

👍 2

Grace Walkuski

06/09/2022, 7:52 PM

Hi 👋 I’m wondering if retentionTimeValue is inclusive or not? (If I was to write a query with an equivalent filter, would I use

<=

Prateek Singhal

06/10/2022, 5:46 PM

Hi team, my hybrid table is designed to push 5 days old data from realtime to offline. But I need to backfill 2 days old data. Can I delete the realtime segment and backfill 2 days old data directly as an offline segment? Will that create any side-effects or minion exceptions?

Jin Yi

06/10/2022, 10:08 PM

does pinot take advantage of the hudi internals like metadata and read-optimized tables for hudi based datalakes?

sunny

06/13/2022, 2:23 AM

Hi all, I wonder if it is possible to use multiple directories in server instance. conf/pinot-server.conf

Copy code

pinot.server.instance.dataDir
pinot.server.instance.segmentTarDir

Vuppala Suresh Kumar

06/13/2022, 7:10 AM

Hi all, Any idea on this session timeout error?

Copy code

2022-06-13 11:58:16.182  INFO 42149 --- [<http://onaws.com:2181)|onaws.com:2181)>] org.apache.zookeeper.ClientCnxn          : Client session timed out, have not heard from server in 10011ms for sessionid 0x0, closing socket connection and attempting reconnect
2022-06-13 11:58:16.283  INFO 42149 --- [<http://onaws.com:2181)|onaws.com:2181)>] org.apache.zookeeper.ClientCnxn          : Opening socket connection to server <http://ec2-54-214-97-234.us-west-2.compute.amazonaws.com/54.214.97.234:2181|ec2-54-214-97-234.us-west-2.compute.amazonaws.com/54.214.97.234:2181>. Will not attempt to authenticate using SASL (unknown error)

Getting this error while connecting to zookeeper via Java Client

Copy code

Connection connection = ConnectionFactory.fromZookeeper
  ("<xxxxxxxxxxxxx>");

Alex Gartner

06/13/2022, 7:28 PM

Question all: Pinot seems really heavily focused on the realtime/streaming tables. Does anyone use it for JUST batching/offline data?

Tim Berglund

06/13/2022, 9:07 PM

If you haven’t heard about the Apache Pinot® vulnerabilities announced last week by Doyensec, then let me be the first to tell you that they exist. Their blog post goes into detail about how the exploits work, but in short, there are three vulnerabilities: 1. Parsing of the OPTIONS() clause. This is a SQL injection vulnerability of medium severity that is in the process of being addressed right now. 2. The timeout bug. An attacker can cause Pinot Server CPU usage to spike, thus denying service to clients. This is a medium-severity vulnerability that will be fixed in a near-future release. 3. Groovy Remote Code Execution. This is a severe vulnerability. The change disabling Groovy by default is already merged, and will be in the 0.11.0 release. If you are running any current or previous release of Pinot, you should disable Groovy. Now, I’m a Pinot community member, and not a committer to the project myself. I’m relaying a summary of conversations I’ve had with folks on the PMC to get a sense of the issues and what the community can expect going forward. Of course, ultimately this is for the Pinot release process to determine. You can stay apprised of releases on GitHub. I do need to say that it’s unfortunate the community had to learn about this from a blog post rather than a responsible disclosure to the Pinot PMC. Security research is an enormously valuable (and, we can all agree, quite cool) endeavor, but for it to make our systems more secure rather than sowing the chaos of zero-days into the wild, responsible disclosure is key. The Apache Software Foundation has published guidelines for how to disclose vulnerabilities. If the health of the Pinot community and Pinot itself is important to you, I personally urge you to follow these and insist that others do. Good behavior arguably emerges from a combination of economic incentives and shared norms, and this is a norm I think most of us can agree to share.

👍🏼 1

👍 15

Prashant Pandey

06/14/2022, 10:09 AM

Thanks team for upgrading to the Slack Pro plan. I was scouring for my old threads the other day, can find them now 🙂

🙌 4

Robin Moffatt

06/14/2022, 1:23 PM

Hi! 👋 I'm looking for speakers to submit to the Call for Papers for Current 2022. This is a technical conference for everything data in motion, and will take place October 4-5 in Austin, Texas. Talks about Apache Pinot would be very welcome and would fit very well in several of the tracks planned, including

Architectures You've Always Wondered About

Pipelines Done Right

, and

Real-Time Analytics

The Call for Papers is open until June 26th. Read more about it in this blog, or DM me if you have any questions. Thanks 🙂

❤️ 1

🦜 5

Laxman Ch

06/14/2022, 6:20 PM

Devs, have couple of questions around schema compatibility. • We have some fields which were defined as dimensions. Can we move them to metrics without recreating the table? • If above answer is no, can we enable star tree index aggregation for some dimension columns? • Upgrade from

0.7.1

0.10.0

.causing compatibility issues with old avro boolean fields with default values. Any specific migration steps here to be followed?

Priyank Bagrecha

06/14/2022, 7:24 PM

What is the recommended approach for batch ingestion of data from let's say either S3 or Hive into Pinot between minion based ingestion v/s ingestion jobs? Are there any pros / cons between the two?

Jagannath Timma

06/14/2022, 8:18 PM

Hello guys, I am looking at pinot upsert/dedup documentation. From what I understand, when upsert is enabled (lets say PK is a string and latest ts col is used determine order), at query time the latest row is returned by pinot. But all the older rows are still stored by Pinot. Is that correct? Also, what is the difference between upsert and dedup? Is it that dedup will actually discard the older row data when a PK conflict is detected?

abhinav wagle

06/15/2022, 4:36 PM

Hellos, Any recommendations from the community on how you have build CI/CD for Pinot schema/table config updates using helm approach ? Any documentation/articles around it will be helpful. Thanks !