Apache Pinot #troubleshooting

abhinav wagle

07/18/2022, 6:39 PM

Hellos, I am trying to change the path here to my custom pinot table path https://github.com/apache/pinot/blob/master/kubernetes/helm/pinot/pinot-realtime-quickstart.yml#L450 . Wanted to understand the significance of

/var/pinot/examples/

and if I can change it ?

Ethan Yu

07/18/2022, 8:06 PM

Hey all, so Im trying to run prometheus and grafana with pinot on a kubernetes cluster, but when I follow the documentation, my pinot deployment completely crashes and none of the pods work. I isolated the issue and it seems that error stems from the jvmOpts parameter in the pinot yaml file. Specifically, I think that the path to the mx exporter is incorrect. For pinot on a kubernetes cluster is there something else I need or is there something I need to configure to find the correct paths? The jvmOpt specifically is -javaagent/opt/pinot/etc/jmx prometheus javaagent/jmx prometheus javaagent.jar=8008/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml

abhinav wagle

07/18/2022, 10:49 PM

Hellos, Does the

Add REALTIME Table

call the curl -X POST <url> -H "accept: application/json" -H "Content-Type: application/json" -d <> internally ?

Alice

07/19/2022, 1:28 AM

Hi team, I’ve a question about segment build process with startree index. If a segment build failed, pinot would stop consuming that kafka partition. Through calling /consumingSegmentsInfo api, this segment shows not_consuming. It will continue consuming by calling /segments/{tableNameWithType}/{segmentName}/reset api. But I’m a little confusion about the offset. I thought it would from the offset where it failed building that segment. But it may not seem so. Based on my test, through checking server log, a segment building failed on Jul 17, and it stopped consuming data since then. I reset this segment at 8:00 am on Jul 18, then new data is ingested since this time. Data earlier this time isn’t ingested.

Fernando Barbosa

07/19/2022, 1:36 AM

Cool question: https://stackoverflow.com/questions/72167026/groovy-function-to-use-conditional-operator-and-datetime-parsing

Saumya Upadhyay

07/19/2022, 6:06 AM

hi All,

Saumya Upadhyay

07/19/2022, 6:07 AM

facing the weird issue in pinot (debugged more and found serialization exception in pinot logs and we see this message offset 39777 in partition 2 with wrong schema id , after getting this exception pinot stopped consuming further from partition 2 ), pinot stopped consuming from 2nd number partition and in propertystore I can see so many segments created and stuck on this offset :

Copy code

{
  "id": "tSHistoricCalibration__2__220__20220715T1700Z",
  "simpleFields": {
    "segment.creation.time": "1657904457323",
    "segment.flush.threshold.size": "10000",
    "segment.realtime.numReplicas": "1",
    "segment.realtime.startOffset": "39777",
    "segment.realtime.status": "IN_PROGRESS"
  },
  "mapFields": {},
  "listFields": {}
}

We are using confluent kafka and schema registry and decoder is used KafkaConfluentSchemaRegistryAvroMessageDecoder. Do we have any configuration so that pinot can ignore such errors and consume further messages @Mayank @Chad

Eaugene Thomas

07/19/2022, 7:26 AM

https://docs.pinot.apache.org/developers/developers-and-contributors/extending-pinot/custom-aggregation-function -> Map Phase -> callback -> Group By Multi Value . Multi Value is pointing to Single value in docs , I guess thats wrong .

Lars-Kristian Svenøy

07/19/2022, 4:16 PM

Hey everyone 👋 Can a star-tree index be used with a multi-dimensional column specified in the split order?

chandarasekaran m

07/19/2022, 4:46 PM

Hi Team , i need help for below scenario • Table 'A' holds information about categories<category_id,category_name>, which is populated by kafka topic 1 • i am getting events from topic 2 , the payload has only

category_id

and i want to filter based on

category_name

during ingestion time.So i want to look up/query Table 'A' to get

category_name

based on

category_id

. Is it possible as per current architecture ? if yes can you point me the code sample?

Abhijeet Kushe

07/19/2022, 9:59 PM

I increased more replicas in Pinot-Server from 1 to 3 https://github.com/apache/pinot/blob/master/kubernetes/helm/pinot/values.yaml#L272 After I brought it back up the the table in controller console was no longer visible.I did not see any exception in the logs I see the below message

Copy code

Instance events-pinot-k8s-controller-0.cdp-dl-pinot-k8s-controller-headless.events-pinot.svc.cluster.local_9000 is not leader of cluster events-pinot due to current session 20266278dd90003 does not match leader session │

Abhijeet Kushe

07/19/2022, 10:01 PM

I tried adding the same schema and table again via the swagger api.It was successful but I still dont see the table

Sukesh Boggavarapu

07/19/2022, 11:03 PM

<!here> how to enable groovy ingestion from pinot controller UI?

Sukesh Boggavarapu

07/19/2022, 11:04 PM

controller.disable.ingestion.groovy=true

Sukesh Boggavarapu

07/19/2022, 11:04 PM

How do we toggle this property? Can we do it from the swagger apis?

Sukesh Boggavarapu

07/19/2022, 11:13 PM

I am getting an error saying it is disabled.

Jackie

07/19/2022, 11:21 PM

@Sukesh Boggavarapu For security reason, we don't allow admin to change this value through the rest API. In order to enable ingestion groovy, you need to change the controller config, and restart the controller

➕ 1

Prashant Pandey

07/20/2022, 6:08 AM

Hi team, we’re facing a bit of a weird problem in our production setup. We call Pinot from another service (lets call it Query Service) using an ALB, which hits the broker nodeport svc. Around 50% of the time, Query Service ends up seeing a 410 error code from the broker. However, we do not get this error even once when we query the tables from Pinot’s query console.

Prashant Pandey

07/20/2022, 9:22 AM

Hi team, it seems like our brokers are running into problems connecting with Zookeeper and they are going into crashloop. PFA the logs:

broker-log

Stuart Millholland

07/20/2022, 2:54 PM

We are getting this error: "There are less instances: %s in instance partitions: %s than the table replication: %s" looks to be from here. There seems to be a relationships b/w the number of kafka partitions that a realtime table reads from and replicasPerPartition setting for a realtime table and that confuses us. Is that a bug? We tried to do replicasPerPartition of 2 and had 1 kafka partition and we received this error.

harry singh

07/20/2022, 6:42 PM

Hi, I am facing an error, when I run a query like select col1, count() from table where col2='val' group by col1 having count() >1 it works on pinot but fails when run via trino-pinot connector. Error: Error from Query Engine

ExtractedRaw

Query select "internal_error_code", count("*") from sr_view_v1 where (("count(*)" > '1')) AND ("network" = 'Visa') group by "internal_error_code" limit 5000001 encountered exception org.apache.pinot.common.response.broker.QueryProcessingException@3e0dac0b with query "select "internal_error_code", count("*") from sr_view_v1 where (("count(*)" > '1')) AND ("network" = 'Visa') group by "internal_error_code" limit 5000001"

Lukas Bergengruen

07/20/2022, 7:35 PM

Hey there! I’m having an issue when trying to ingest data into a table I successfully created in a pinot cluster.

Lukas Bergengruen

07/20/2022, 7:43 PM

Hey there! I’m facing an issue when trying to ingest data into a table I created successfully on a Pinot cluster using spark. I’m getting the following error:

java.lang.IllegalArgumentException: Too large frame: 5211883372140375593

This is the command I’m using:

Copy code

spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master spark://${CLUSTER_URL}:${PORT} --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" --conf "spark.executor.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" https://${CLUSTER_URL}:${PORT}://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile ./batch_input/spark_job_spec.yaml

Alice

07/21/2022, 1:33 AM

Hi team, I’ve an issue using timestamp index. Any idea how to solve it? I’m using pinot 0.11 master and testing timestamp index. The first batch of segments failed building due to the following error. Then the table stopped consuming stream data.

Copy code

2022/07/20 08:05:35.579 ERROR [LLRealtimeSegmentDataManager_table_name_stage__1__0__20220720T0805Z] [table_name_stage__1__0__20220720T0805Z] Could not build segment
java.lang.NullPointerException: null
	at org.apache.pinot.segment.spi.creator.ColumnIndexCreationInfo.getDistinctValueCount(ColumnIndexCreationInfo.java:67) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-4f717514068d54784f4636ca9e07dd00521e8d86]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.init(SegmentColumnarIndexCreator.java:201) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-4f717514068d54784f4636ca9e07dd00521e8d86]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:216) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-4f717514068d54784f4636ca9e07dd00521e8d86]
	at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:123) ~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-4f717514068d54784f4636ca9e07dd00521e8d86]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:840) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-4f717514068d54784f4636ca9e07dd00521e8d86]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:767) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-4f717514068d54784f4636ca9e07dd00521e8d86]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:666) [pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-4f717514068d54784f4636ca9e07dd00521e8d86]
	at java.lang.Thread.run(Thread.java:829) [?:?]

Alice

07/21/2022, 6:31 AM

Hi team, if a segment failed building, the segment status is ‘NOT_CONSUMING’, but the table status is GOOD. Through pinot server log, I see offset is forwarding, when data is kept being written to Kafka. However, data isn’t really ingested in pinot server because data couldn’t be queried and the table row number isn’t increasing. This would lead to data loss using segment reset api. My question is is there any way to avoid data loss?

harnoor

07/21/2022, 7:47 AM

Hi all. I have a doubt, i am firing the below query -

explain plan for (select max(duration_millis) from backend_entity_view where $segmentName='backend_entity_view__3__375__20220720T0955Z' and regexp_like(backend_name, 'perf'))

and segment

backend_entity_view__3__375__20220720T0955Z

has

backend_name.fst_index.size=497

(FST index is present). However I couldn’t see it in explain plan.

Copy code

BROKER_REDUCE(limit:10)	0	-1
COMBINE_AGGREGATE	1	0
AGGREGATE(aggregations:max(duration_millis))	2	1
TRANSFORM_PASSTHROUGH(duration_millis)	3	2
PROJECT(duration_millis)	4	3
FILTER_EMPTY	5	4

Prashant Pandey

07/21/2022, 8:54 AM

<https://docs.pinot.apache.org/>

is giving 400 with the following message:

The custom domain for this content has not been correctly configured. If you are the owner of this page, verify that your custom domain is correctly set up in GitBook.

//@Mayank

Athul T R

07/21/2022, 1:46 PM

Hi everyone ! Is there a way to ingest delta tables stored in S3 to Pinot using Batch Ingestion ? Do I have to make a Batch Record Reader Plugin for it ?

Sukesh Boggavarapu

07/21/2022, 7:59 PM

I have a real time table and sometimes because of upstream process going wrong, pinot ingests the same data again. So, basically some duplicate data. If we know when the upstream went bad, is there a way to identify segments created after a certain point of time and delete them through apis?

Sukesh Boggavarapu

07/21/2022, 8:39 PM

I have a real time table ingesting from kafka. Does the offsets in a segment metadata correspond to the kafka stream offsets? Or does pinot have its own internal offsets?