Apache Pinot #general

Ashish Kumar

11/14/2022, 1:10 PM

Hi Team, I have some data in hive table, I want to push that data into pinot, what's the best way to do it? Offline tables are fine for the use-case but I don't see any hive table connector in pinot batch spark jar.

vishal

11/15/2022, 9:22 AM

Hi team, i've created realtime to offline flow. i've pushed 2k data points to realtime table. 1800 data points has been moved to offline table but same data i can see in realtime tables as well. how to remove those data from realtime table?

Tim Berglund

11/15/2022, 4:34 PM

Pinot Community! The Real-Time Analytics Summit 2 (The SQL) is live in 25 minutes. rtasummit2.com to sign up for free and attend.

🌟 2

🍷 1

🆒 2

Tim Berglund

11/15/2022, 4:35 PM

It’s not all Pinot content, but it’s all of interest if you’re a person who works with Pinot. Get in there! I hope to see you there. 💥

🦜 2

abhinav wagle

11/15/2022, 11:20 PM

Hellos, Is there any particular reason why this config and this is not exposed as part of Server config ? We see significant volume for messages consumed via Kafka and are looking at ways to limit it.

Ehsan Irshad

11/16/2022, 5:14 AM

Hi ...What is recommended way of moving data from S3 / Parquet to Pinot? I see two option, Spark & Minions ...

Diogo Baeder

11/16/2022, 10:57 AM

Hey folks, just out of curiosity: has gRPC ever been considered as an option for adding a binary protocol for reading data from Pinot? I was wondering whether that could be an efficient option...

Steven Hall

11/17/2022, 1:25 AM

Hi Everyone. I have a small stack running on localhost integrated with Kakfa. Next I want to integrate Pinot with our production Confluent Kafka platform. The data is mostly in AVRO format AND we have customized Confluent Kafka to require an OKTA auth token. It appears that out of the box is support for JSON payloads from Kafka and HTTP Basic Auth. I have searched for OKTA on this channel and found no hits. Assuming I wanted to make some changes to support AVRO payloads and OKTA auth when integrating with Kafka, can anyone point me at the classes that I should take a look at or study. Do you have any other advice that I ought to know about before I commence down this path? Thanks appreciate it.

vishal

11/17/2022, 7:19 AM

Hi Team, i am trying to implement de-duplication with realtime to offline flow. things i've added as below:

Copy code

Schema:

"primaryKeyColumns": [
           "count"
]

Copy code

Realtime:

"routing": {
     "instanceSelectorType": "strictReplicaGroup"
 },

    "task": {
        "taskTypeConfigsMap": {
            "RealtimeToOfflineSegmentsTask": {
                "bufferTimePeriod": "3m",
                "bucketTimePeriod": "5m",
                "schedule": "0 */1 * * * ?",
                "mergeType": "dedup",
                "maxNumRecordsPerSegment": "10"
            }
        }
    },

but its not working! do i need to add anything else? or am i doing anything wrong? Thanks, Vishal

Abdelhakim Bendjabeur

11/17/2022, 1:31 PM

Hello, just confirming, Does the kafka low level consumer guarantee exactly-once or is it at-least-once?

Nizar Hejazi

11/17/2022, 3:54 PM

Hey, what does the value of

"topic.consumption.rate.limit"

refer to? Is it the number of messages consumed by second?

javier

11/17/2022, 4:28 PM

Hi. I hope this is the right channel for this. At FOSDEM (largest FLOSS conference in Europe) we will be hosting a track on “Fast and/or Streaming Data” track this year. The call for papers is now open and it would be great to receive submissions featuring Pinot. More info at https://javier.github.io/fast_and_streaming_data_devroom_cfp_fosdem_2023/

Prabhav Singh

11/18/2022, 5:53 AM

Hi Team. We are planning to use a deployment of Pinot in Production with the following configurations:

Copy code

Brokers: 3 Replicas
Controllers: 3 Replicas
Servers: 5 Replicas
Zookeepers: 3 Replicas

However, we are facing an issue with our controller deployment. When we run queries through presto, we are not able to get the result consistently. Many times the query returns the following error:

Unexpected response status: 500 for request  to url <http://pinot-controller.dataplatform.svc.cluster.local:9000/tables/><table-name>/instances, with headers {Accept=[application/json]}, full response {"code":500,"error":"Failed to get full list of /pinot/CONFIGS/PARTICIPANT"}

We did an analysis of the error and found out that this is happening because of only specific controllers being able to fetch

/pinot/CONFIGS/PARTICIPANT

for a specific table. For example if we fire a CURL request to controller replica 1 for Table A and get a response 200, we get an error for the other two controller. On further analysis, we found that we only get a 200 Response, IF the load balancer directs the query for a table to its leader controller. We were hoping for help on this issue. Temporary solutions would be to reduce to a single controller which is not recommended for production deployments. Communities help here would be greatly appreciated!

Ashish Kumar

11/18/2022, 9:53 AM

Hi Team, spark-submit ..... pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile spark_job_spec.yaml Can pinot-batch-ingestion jar not read jobSpecFile from S3/HDFS? currently it works if jobspecfile is present in local but errors out if I pass it in S3.

Peter Pringle

11/21/2022, 2:52 AM

I see pinot now has an end point /users access control. Are there any docs on how this works. Any special configuration needed?

Peter Pringle

11/21/2022, 2:53 AM

https://docs.pinot.apache.org/operators/operating-pinot/access-control does mention the feature, javadoc has an example payload

coco

11/22/2022, 2:00 AM

Hi. Pinot Team! https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/import-from-apache-kafka#extract-record-headers-as-pinot-table-columns I'm testing this function. Is this feature included in the release-0.11.0 version? In my test, the record is streamed from nifi. The 'key' and 'metadata$offset' columns in the Pinot table are always null. Is there any setting to add to Pinot or Kafka? table config :

Copy code

"tableIndexConfig": {
    "loadMode": "MMAP",
    "nullHandlingEnabled": true,
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowLevel",
      "stream.kafka.topic.name": "meetupRSVPEvents",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:19092",
      "stream.kafka.consumer.prop.auto.offset.reset": "largest",
      "realtime.segment.flush.threshold.time": "12h",
      "realtime.segment.flush.threshold.size": "10K",
      "stream.kafka.metadata.populate": "true"
    }
  },

Prabhav Singh

11/22/2022, 5:14 AM

Hi Pinot Team! I wanted to check whether there is an easy way to add a new column to Pinot Offline & Realtime Tables. I just want to create a copy of a column with a different name without having to backfill the entire data again.

Loïc Mathieu

11/22/2022, 2:07 PM

Hi, I just deployed a Pinot cluster on Kubernetes using the Helm chart and I notice it's still using Pinot 0.10. Is it planned to release a new version of the chart with Pinot 0.11 ?

Marco Ndoping

11/22/2022, 8:44 PM

Hi, I'm trying to run a few queries using Python's

Pinotdb

client version

0.4.5

but it doesn't seem to be working when the query has an alias. Here are a few queries (tried other cases as well) I've tried: "SELECT "tbl1"."x" FROM "y" "tbl1"" , "SELECT "tbl1"."x" FROM "y" AS "tbl1"". Has anyone encountered this issue? Is this a known limitation?

Alexandre Estevam

11/23/2022, 7:35 AM

Hi Everyone, I’m trying to build a real-time leaderboard and found Apache Pinot, but I’m not really sure it can solve the entire problem. Any help would be appreciated! I’m really stuck in this problem 😞 Problem: I need to build a real-time leaderboard that rank users by the avg of scores from a “PostScore” table and can return some statistics like: number of users in the ranking, position of each user in the ranking, filter by post categories, post skills (these two should affect the sum of scores as well, it’s like having one ranking for each filter possibility), filter by user age, user country and finally, computing only the scores given in the last 24 hours or all-time. is it possible with Apache Pinot?

vishal

11/23/2022, 8:24 AM

Hi, i am getting below log for realtime to offline table can somebody help me to understand it?

Copy code

Not Lead, skip processing CronJob: table - tab6_REALTIME, task - RealtimeToOfflineSegmentsTask

vishal

11/28/2022, 6:55 AM

Hi Team, i am pushing realtime to offline data and trying to implement upsert but its returning error saying that we can not use usersert with realtime to offline table.

Copy code

"error": "RealtimeToOfflineTask doesn't support upsert table!"

can't we use upsert with the reltime to offline flow?

vishal

11/28/2022, 11:36 AM

Hi all, i am pushing data to offline table from s3 bucket. i want to update old data as per the primary key same as upsert. how can i do it for offline table?

Oscar perez

11/29/2022, 10:52 AM

Hi, does it make a difference in terms of performance if raw data is json or parquet?

Oscar perez

11/29/2022, 10:52 AM

I mean the segment creation optimizes the format or the raw data makes a difference?

vishal

11/29/2022, 11:01 AM

Hi Team, does pushing key-value pair data to realtime table affect to segment creation? because whenever i push data without key-value segments are completing with 500 data points only but when i tried with key-value segments are not completing. even i've pushed 20k data but no segments completed yet.

Rostan TABET

11/29/2022, 11:24 AM

Hit Pinot team! Does Pinot 0.11 support the

UNION

operator? More generally, is there a way to find out which subset of SQL is currently supported?

Lewis Yobs

11/29/2022, 1:07 PM

Re: UNION operator: the

<https://github.com/apache/pinot/issues/9223>

mentions UNION is "already supported by Pinot currently but not on the new engine." (new meaning the V2 engine

<https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine>

)

Rostan TABET

11/29/2022, 1:21 PM

Thanks!