Apache Pinot #troubleshooting

Priyank Bagrecha

10/28/2022, 7:52 AM

no presto / trino

Xiang Fu

10/28/2022, 8:33 AM

Copy code

schedulerWaitMs=501,reqDeserMs=1,totalExecMs=72,

how many qps

Priyank Bagrecha

10/28/2022, 8:55 AM

trying to execute 1200 but it is no where close to that due to slow queries.

Priyank Bagrecha

10/28/2022, 8:56 AM

i have 3 brokers with 4 cpu each and 16 GB memory, 15 GB heap. 12 servers with 8 cpu each and 64 GB memory with 8 GB for heap and 55 GB off heap. each server has 1 TB disc. table size is 120 GB with replication of 3 and total of 100 segments.

Priyank Bagrecha

10/28/2022, 9:04 AM

this is an offline table

Xiang Fu

10/28/2022, 9:04 AM

no wonder, try to increase the qps and see when this

schedulerWaitMs

will go up

Xiang Fu

10/28/2022, 9:04 AM

also try to add range index for hour

Priyank Bagrecha

10/28/2022, 9:04 AM

Xiang Fu

10/28/2022, 9:04 AM

or inverted index

Xiang Fu

10/28/2022, 9:05 AM

I think hour only has 24 unique values right

Priyank Bagrecha

10/28/2022, 9:05 AM

hour has an inverted index

Priyank Bagrecha

10/28/2022, 9:05 AM

yes

Lee Wei Hern Jason

10/28/2022, 10:04 AM

Hi Team, I am running on EC2 and trying out Pinot monitoring. I installed prometheus and prometheus node_exporter. configured my javaopts (with the jar and pinot.yml)

Copy code

-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms2G -Xmx2G -Dlog4j2.configurationFile=conf/log4j2.xml -Dpinot.admin.system.exit=true -Dplugins.dir=/opt/pinot/plugins

However when i view the JMX metrics, i can’t see any of the metrics stated here. Did anyone encountered this issue before ?

Priyank Bagrecha

10/28/2022, 5:47 PM

query qps is 0 but latency metric is still gettin populated, and jvm used is going up and down for controller, broker and server. i am trying to understand what is going on.

Priyank Bagrecha

10/28/2022, 5:57 PM

metric for qps

Copy code

sum by (table) (rate(pinot_broker_queries_Count[10m]))

Priyank Bagrecha

10/28/2022, 5:57 PM

metric for latency

Copy code

avg by (table) (pinot_broker_queryExecution_50thPercentile)

and p75, p95, p99 and p999 as well

Xiang Fu

10/28/2022, 6:11 PM

latency won’t go down if there is no query, the metrics kept the latest values

➕ 2

Priyank Bagrecha

10/28/2022, 6:46 PM

Nickel Fang

10/29/2022, 11:00 AM

Hi Team, I wan to debug pinot server. I start some components with docker and start pinot server with IntelliJ. I get the error when creating table via REST API

Copy code

{
  "code": 500,
  "error": "org.apache.pinot.spi.stream.TransientConsumerException: org.apache.pinot.shaded.org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata"
}

the streamConfig is as below

Copy code

"streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "test",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
      "stream.kafka.decoder.prop.projectId": "1",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:9092",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
      "realtime.segment.flush.threshold.time": "2h",
      "realtime.segment.flush.threshold.rows": "0",
      "realtime.segment.flush.threshold.segment.size": "300M",
      "realtime.segment.flush.autotune.initialRows": "10000"
    }

shivam

10/31/2022, 11:06 AM

hi team, one of the partition is not getting consumed for sometime and we are seeing huge lag on that partition (~20million). as of now we have tried these things: - disabling and re-adding the realtime server pod - ran RealtimeSegmentValidationManager on the target table - deleted that segment and retried, but its getting stuck for that particular partition - tried rotating the pods -- tried reset segment as well

Lee Wei Hern Jason

10/31/2022, 11:20 AM

[SOLVED] Just need to change the port #. Hi Team, I am trying to view metrics in JVM using jconsole. Running locally works (1st pic) but when i connect it remotely, it doesnt work (2nd pic). I configured the JAVA_OPTS the same way for both. Anyone knows why is this happening ?

Copy code

JAVA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9002 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

Lee Wei Hern Jason

10/31/2022, 11:21 AM

2nd pic:

harnoor

10/31/2022, 1:35 PM

Hi folks. I am unable to get the detailed output when I run

explain plan for

for my queries, instead I am getting

ACQUIRE_RELEASE_COLUMNS_SEGMENT

for all the queries. I am expecting filter index to be picked in the output as per the docs: https://docs.pinot.apache.org/users/user-guide-query/explain-plan

Stuart Millholland

10/31/2022, 7:46 PM

Hi Pinot friends. We are trying out the Timestamp Index and it's working great except for one portion. The realtime to offline task is now failing with the following error:

Priyank Bagrecha

10/31/2022, 11:48 PM

is consistent push and rollback feature for offline tables available in release 0.11?

Mamlesh

11/01/2022, 3:17 AM

Hi All, Ive faceing one issue in my 3 node cluster. Like table is stable on run like 1hr retention, but after new table added in cluster for 24hr retention, 2nodes in cluster not deleting segments for both tables. Ive checked controller logs only 1st node deleting its segmests but other 2 nodes controllers not even start retention manager. Did anyone face kind of issue, so please let me know. Thankyou in advance :)

Alice

11/01/2022, 9:04 AM

Hi team, I’ve a question about segment flush threshold. If there’s only 20MB data in 6 hours coming from streaming data, it’s better to have the following config. But does it really cost extra jvm if “realtime.segment.flush.threshold.segment.size” is set 200MB?

Copy code

"realtime.segment.flush.threshold.time": "6h",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.segment.size": "20M",

Sumit Khaitan

11/01/2022, 12:23 PM

Hi team. I am new to Pinot and have a use case. We have minutely files coming on Azure Blob Storage and want to load those minutely files to Pinot.

Copy code

Can Pinot directly read and ingest those minutely files from Azure Blob Storage or there has to be a Spark/ETL pipeline that needs to ingest the data to Pinot ?

Abhishek Dubey

11/01/2022, 2:18 PM

Hi Team, when I try to update schema, I get this error

Copy code

{
  "code": 400,
  "error": "Backward incompatible schema <name>. Only allow adding new columns"
}

What is the way to make schema backward compatible and allow updates to schema ?

Priyank Bagrecha

11/01/2022, 9:25 PM

from https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#querying-all-segments

Copy code

Currently, we have two different mechanisms to prune segments on the broker side to minimize the number of segment for processing before scatter-and-gather.

But I only see

partitioning

. What is the second mechanism?