Apache Pinot #troubleshooting

Pankaj Thakkar

12/15/2021, 8:01 PM

hey guys, we are ingesting data from a kafka; it seeems pinot is stuck in a loop; it complains about not finding the right offset and does not consume the data that is sitting in kafka.

Tiger Zhao

12/15/2021, 8:15 PM

Will pinot automatically delete segments from S3 when a table is deleted? Based on what I've seen, it looks like deleting a realtime table will eventually delete the segments associated with that table. But I'm not sure if this behavior is also true for offline tables where the segments are generated and uploaded with SegmentCreationAndMetadataPush?

Priyank Bagrecha

12/15/2021, 10:05 PM

i started using v0.9.1 and keep running into these errors every couple of hours

Copy code

2021/12/15 19:09:59.854 ERROR [GroupCommit] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Interrupted while committing change, key: /pinot-poc/INSTANCES/Server_10.220.12.85_8098/CURRENTSTATES/100000abfb404c3/km_mp_play_startree_REALTIME, record: km_mp_play_startree_REALTIME, {}{}{}
java.lang.InterruptedException: null
        at java.lang.Object.wait(Native Method) ~[?:?]
        at org.apache.helix.GroupCommit.commit(GroupCommit.java:163) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
        at org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:189) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
        at org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:177) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.preHandleMessage(HelixStateTransitionHandler.java:164) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:330) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

Nicholas Yu

12/16/2021, 2:17 AM

i’m trying to run a batch ingest job using aws emr. i’m running a spark submit step like so

Copy code

spark-submit 
--master yarn
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand 
s3://<bucket_name>/lib/pinot-all-0.9.0-jar-with-dependencies.jar -jobSpecFile s3://<bucket_name>/jobs/<table_name>/job.yaml

but i’m getting a java.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main

Anish Nair

12/16/2021, 8:02 AM

Hi team, I have set follwoing flush config for realtime table: "realtime.segment.flush.threshold.size": "0", "realtime.segment.flush.threshold.segment.size": "500M", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", But still the consuming segment is showing , threshold size= 100000. is this expected?

Copy code

{
  "segment.creation.time": "1639641347711",
  "segment.flush.threshold.size": "100000",
  "segment.name": "max_reporting_aggregations__0__0__20211216T0755Z",
  "segment.realtime.numReplicas": "2",
  "segment.realtime.startOffset": "598962485",
  "segment.realtime.status": "IN_PROGRESS",
  "segment.table.name": "max_reporting_aggregations",
  "segment.type": "REALTIME"
}

Ali Atıl

12/16/2021, 10:59 AM

Hi Everyone, Is there a way to indicate the actual data field for records read from kafka ? For example if I had records in Kafka like in below, would it be possible for Pinot to extract actual records from "employee" nested field?

Copy code

{
  "data": {
    "employee": {
      "name": "ali",
      "salary": 56000,
      "married": true,
      "messageTime": 1639652167
    }
  }
}

Example schema:

Copy code

{
  "schemaName": "employee",
  "dimensionFieldSpecs": [
    {
      "name": "name",
      "dataType": "STRING"
    },
    {
      "name": "salary",
      "dataType": "DOUBLE"
    },
    {
      "name": "married",
      "dataType": "BOOLEAN"
    }
  ],
  "metricFieldSpecs": [],
  "dateTimeFieldSpecs": [
    {
      "name": "messageTime",
      "dataType": "LONG",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:MILLISECONDS"
    }
  ]
}

Map

12/16/2021, 12:27 PM

has this setting

pinot.set.instance.id.to.hostname

changed in recent versions? Dones’t seem to be working any more and IPs are used instead

Jonathan Meyer

12/16/2021, 4:33 PM

Hello 👋 Pinot's documentation for upgrading (https://docs.pinot.apache.org/operators/operating-pinot/upgrading-pinot-cluster) indicates that the recommended upgrade order of components is: 1. Controller 2. Broker 3. Server 4. Minion Using the official Pinot Helm chart, is there any way to enforce this order ? It seems like all components try to upgrade at the same time - since there is only a single configuration tag / version in the values

Tiger Zhao

12/16/2021, 4:53 PM

Is it possible to rename an existing table?

Luis Fernandez

12/16/2021, 9:44 PM

I’m running into this issue where the servers are starting to ingest many and many records ( a spike on ingestion) but when i see the throughput from a kafkastreams perspective i don’t see any spikes, and then weirdly enough i see the servers restarting for some reason, while checking the logs I do see this:

Copy code

Caught exception while processing query: QueryContext{_tableName='ads_metrics_REALTIME', _selectExpressions=[listing_id, sum(click_count), sum(impression_count), sum(cost), sum(order_count), sum(revenue)], _aliasList=[null, null, null, null, null, null], _filter=(shop_id = '25746445' AND serve_time BETWEEN '1637125200' AND '1639717199'), _groupByExpressions=[listing_id], _havingFilter=null, _orderByExpressions=null, _limit=6000, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:ads_metrics_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:ads_metrics_REALTIME), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id)), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:cost))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:revenue))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:AND, operands:[Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:shop_id)), Expression(type:LITERAL, literal:<Literal longValue:25746445>)])), Expression(type:FUNCTION, functionCall:Function(operator:BETWEEN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:serve_time)), Expression(type:LITERAL, literal:<Literal longValue:1637125200>), Expression(type:LITERAL, literal:<Literal longValue:1639717199>)]))])), groupByList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id))], orderByList:[], limit:6000, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}))}
java.lang.ArrayIndexOutOfBoundsException: null

this started happening out of nowhere so i’m unsure as to what’s happening has anyone gotten this similar kind of error? I also don’t have the stack trace sadly i don’t know why it isn’t being logged 😞 also in general our p99 response times have been impacted

Elon

12/16/2021, 10:04 PM

Does anyone here use pool based routing? Is there any advantage to using 3 pools with 4 replica groups vs using 2 pools with 6 replica groups? i.e. does it make sense to have more than 1 pool?

Elon

12/16/2021, 11:15 PM

Question about split commit: we enabled the config on the controller, server and table (i.e.

peerSegmentDownloadScheme

= 'http'), and see

"isSplitCommitType":true

log messages in the server but noticed the controller still contains those segments in the temp directory

/var/pinot/controller/data/untarredFileTemp

- does that indicate we don't have something configured incorrectly?

Zsolt Takacs

12/17/2021, 2:26 PM

We had to rush the upgrade to 0.9.1 because of the log4shell issue and ran into this change afterwards: https://github.com/apache/pinot/pull/7523 We are using the realtimeToOffline task with an upsert based table and using a rollup based on our business rules to get the same results in the offline segments. This change makes it impossible to keep the same config if we want to change the table config.

➕ 1

Priyank Bagrecha

12/17/2021, 6:41 PM

exception_server_logs.txt

Tao Hu

12/17/2021, 6:44 PM

Hi team, I tried to create a text index in the sample table dimBaseballTeams on teamName column, but when I use text match query, I got

Copy code

{
  "message": "QueryExecutionError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.filter.TextMatchFilterOperator.getNextBlock(TextMatchFilterOperator.java:45)\n\tat org.apache.pinot.core.operator.filter.TextMatchFilterOperator.getNextBlock(TextMatchFilterOperator.java:30)\n\tat org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:62)",
  "errorCode": 200
}

Here is my table config

Copy code

{
  "tableName": "dimBaseballTeams",
  "tableType": "OFFLINE",
  "isDimTable": true,
  "segmentsConfig": {
    "segmentPushType": "REFRESH",
    "replication": "1"
  },
  "tableIndexConfig": {
    "noDictionaryColumns": [
      "teamName"
    ]
  },
  "fieldConfigList": [
    {
      "name": "teamName",
      "encodingType": "RAW",
      "indexType": "TEXT"
    }
  ],
  "tenants": {},
  "metadata": {
    "customConfigs": {}
  }
}

Luis Fernandez

12/17/2021, 7:15 PM

hey just bumping this message in case anyone has any idea as to what’s happening https://apache-pinot.slack.com/archives/C011C9JHN7R/p1639691077070700

Nicholas Yu

12/20/2021, 12:40 AM

hi team, has anyone successfully ran a batch ingestion job from s3 into pinot using aws emr (spark)? looking for sample configurations/jobspecs/general help please and thanks 🙂

eywek

12/20/2021, 12:55 PM

Hello, I’m trying to use the new LIKE operator, but it seems that the

NOT LIKE

doesn’t work, do you know why? i.e. I’m having results with

Copy code

select * from datasource_61c064dc1c9900030074e5f3 where JSONEXTRACTSCALAR("labels", '$.locale', 'STRING') LIKE '%fr_FR%' limit 10

but 0 results with

Copy code

select * from datasource_61c064dc1c9900030074e5f3 where JSONEXTRACTSCALAR("labels", '$.locale', 'STRING') NOT LIKE '%en_US%' limit 10

where labels is

Copy code

{
  "locale": "fr_FR",
  "brand": "undiz"
}

Thank you

Weixiang Sun

12/20/2021, 7:14 PM

I am trying to upload the realtime table schema with the following dateTimeFieldSpecs

Copy code

"dateTimeFieldSpecs": [
    {
      "name": "timestamp",
      "dataType": "LONG",
      "defaultNullValue": 0,
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:MILLISECONDS"
    },
    {
      "name": "timestamp_seconds",
      "dataType": "LONG",
      "defaultNullValue": 0,
      "transformFunction": "toEpochSecondsRounded(timestamp, 1)",
      "format": "1:SECONDS:EPOCH",
      "granularity": "1:SECONDS"
    }
  ]

I got the following error:

Copy code

{
  "code": 400,
  "error": "Cannot add invalid schema: schema_name. Reason: Exception in getting arguments for transform function 'toEpochSecondsRounded(timestamp, 1)' for column 'timestamp_seconds'"
}

What is the wrong?

Ayush Kumar Jha

12/21/2021, 5:11 AM

Hi everyone I am trying to set up monitoring and for that using this

Copy code

ALL_JAVA_OPTS="-javaagent:jmx_prometheus_javaagent-0.12.0.jar=8088:pinot.yml -Xms4G -Xmx4G -XX:MaxDirectMemorySize=30g -Dlog4j2.configurationFile=conf/pinot-admin-log4j2.xml -Dplugins.dir=$BASEDIR/plugins"
sudo bin/pinot-admin.sh StartController -configFileName /home/centos/controller.conf

But could not access the metrics at 8088 port

Diana Arnos

12/21/2021, 3:23 PM

👋 hello there How can I set up the consumer group id when I have a realtime table ingesting from a Kafka topic? It creates the group with id 0. We would need a group id for monitoring purposes. I tried setting it up through the table config as

stream.kafka.consumer.group.id

and `stream.kafka.group.id`as for the Kafka docs and it still creates the consumer group with id as 0 😞

Stav Gayer

12/21/2021, 3:37 PM

Hey, i’m testing pinot with trino on the last few days and i see something weird in the metrics (i’m running pinot 0.9.1 and trino 366 trino configured with

Copy code

pinot.max-rows-per-split-for-segment-queries=1000000
pinot.request-timeout=1m

) as you can see, “broker jvm used” graph continues to rise and fall even when there are no requests to the server at all, and this did not stop until I deleted the pods The same thing happened to the server In addition, Trino is almost unusable Simple query like

Copy code

select * from bi_test_table where page_url like '%google%' and epoch_ts >= 1639440000
  AND epoch_ts < 1640044800

get a timeout after a minute and sometimes crash the servers Where am I wrong?

Weixiang Sun

12/21/2021, 6:11 PM

I have a quick question about hybrid table. Can we configure the red line? such as move it from starting of Mar 25 to starting of Mar 24?

Weixiang Sun

12/22/2021, 3:04 AM

Is there any document about how to create the hybrid table? I see the offline table should have the same name as realtime table. Why make such assumption? How about the hybrid table name and child table name? Should they be same also?

Luis Fernandez

12/22/2021, 2:30 PM

hey friends, just reaching out with this error again as I haven’t been able to come up with a reason as to why it’s happening,

Copy code

Caught exception while processing query: QueryContext{_tableName='ads_metrics_REALTIME', _selectExpressions=[listing_id, sum(click_count), sum(impression_count), sum(cost), sum(order_count), sum(revenue)], _aliasList=[null, null, null, null, null, null], _filter=(shop_id = '25746445' AND serve_time BETWEEN '1637125200' AND '1639717199'), _groupByExpressions=[listing_id], _havingFilter=null, _orderByExpressions=null, _limit=6000, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:ads_metrics_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:ads_metrics_REALTIME), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id)), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:cost))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:revenue))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:AND, operands:[Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:shop_id)), Expression(type:LITERAL, literal:<Literal longValue:25746445>)])), Expression(type:FUNCTION, functionCall:Function(operator:BETWEEN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:serve_time)), Expression(type:LITERAL, literal:<Literal longValue:1637125200>), Expression(type:LITERAL, literal:<Literal longValue:1639717199>)]))])), groupByList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id))], orderByList:[], limit:6000, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}))}
java.lang.ArrayIndexOutOfBoundsException: null

more information, it seems to have something to do with my serve_time filter, the longer i look for data (more than 3 days) the more likely i’m to get this error.

Diogo Baeder

12/22/2021, 7:40 PM

Hi folks! I seem to have found a bug, but I'd like to confirm here first, and if it does look like a bug I'll open a ticket on GitHub - I just want to check whether I'm doing anything wrong.

Vibhor Jaiswal

12/28/2021, 10:26 PM

Hi All . We are couple of new users from a NA based Investment Bank . We are pocing heavily on Pinot and getting some road blocks and need some guidance

Priyank Bagrecha

12/28/2021, 10:55 PM

i added a sorted inverted index on one field, and now i am not able to add inverted index on another fields. is that expected? i am using a derived field

event_ts_5_min

which represents start of a 5 minute time bucket. would it even make sense to add inverted index on

event_ts_5_min

test_id

is a filter in all queries. I did read

A sorted index performs much better than an inverted index, but it can only be applied to one column per table.

in the documentation but didn't think that i can't apply inverted index on other fields.

Tao Hu

12/29/2021, 12:41 AM

Hi team, looks like the text_match query doesn't work as described in the docs when search for multiple words. Based on the doc, the query should return documents that contains the entire phrase. In my use case, phrases such as "Boston Doves", "Boston Rustlers" does not match with "Boston red" but still showing in the result. Can someone take a look at it? I'm using 0.9.2. Thanks!

yelim yu

12/30/2021, 2:10 AM

Hi, this is yelim from korea. There are two questions i want to ask. 1. when there is a null value in a certain row while we want to overwrite not-null value on there, why is it not possible? We found difficulty on “ only not-null value can be overwrited at this version” 2. if we want to delete certain row in a pinot table, is there any solution to to do this? (just like alter table, or drop certain rows in normal db) Many thks beforehand!