Apache Pinot #troubleshooting

Diogo Baeder

11/18/2021, 1:31 PM

I don't really know, that part was not written by me, but I'll try to find out. By the way, your trick with the env vars works like a charm 🙂

Priyank Bagrecha

11/18/2021, 6:53 PM

Ha, that explains why I can't get 0.9.0 working via launcher scripts. Thank you. Can we please update the documentation for it?

Ali Atıl

11/18/2021, 9:51 PM

Hello everyone, is there anyway to do join operation on real-time tables?

Map

11/19/2021, 10:41 PM

Any idea how to trouble this error message?

2021/11/19 223605.496 INFO [CurrentStateComputationStage] [HelixController-pipeline-task-pinot-prod-(aa26cf97_TASK)] Event aa26cf97_TASK : Ignore a pending message ee7f9ef0-1de9-4737-b0b4-db4a4e1b9073 for a non-exist resource table0_REALTIME and partition table0__0__0__20211119T2150Z

Mahesh babu

11/22/2021, 12:15 PM

Hi Team ,I'm trying to setup pinot in docker and load table . I'm Facing issues while loading data into table. ERROR: java.lang.RuntimeException: Failed to read from Schema URI - 'http://localhost:9000/tables/transcript/schema', . can you please help me to fix this issue. I'm using this yml file.executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' jobType: SegmentCreationAndTarPush inputDirURI: '/tmp/pinot-quick-start/rawdata/' includeFileNamePattern: 'glob:**/*.csv' outputDirURI: '/tmp/pinot-quick-start/segments/' overwriteOutput: true pinotFSSpecs: - scheme: file className: org.apache.pinot.spi.filesystem.LocalPinotFS recordReaderSpec: dataFormat: 'csv' className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader' configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig' tableSpec: tableName: 'transcript' schemaURI: 'http://localhost:9000/tables/transcript/schema' tableConfigURI: 'http://localhost:9000/tables/transcript' pinotClusterSpecs: - controllerURI: 'http://localhost:9000'

Mahesh babu

11/22/2021, 2:58 PM

docker run --rm -ti --network=pinot-demo -v /tmp/pinot-quick-start:/tmp/pinot-quick-start --name pinot-data-ingestion-job apachepinot/pinot:latest LaunchDataIngestionJob -jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml

Priyank Bagrecha

11/22/2021, 7:30 PM

i am running into this stack trace in log with in a second after adding a real-time table

Copy code

021/11/20 00:18:41.296 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Exception while executing a state transition task km_mp_play_startree__103__0__20211120T0018Z
java.lang.reflect.InvocationTargetException: null
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:175) ~[?:?]
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) ~[?:?]
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) ~[?:?]
        at org.apache.pinot.segment.spi.memory.PinotByteBuffer.allocateDirect(PinotByteBuffer.java:38) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.segment.spi.memory.PinotDataBuffer.allocateDirect(PinotDataBuffer.java:115) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.segment.local.io.writer.impl.DirectMemoryManager.allocateInternal(DirectMemoryManager.java:53) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.segment.local.io.readerwriter.RealtimeIndexOffHeapMemoryManager.allocate(RealtimeIndexOffHeapMemoryManager.java:80) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.segment.local.realtime.impl.forward.FixedByteSVMutableForwardIndex.addBuffer(FixedByteSVMutableForwardIndex.java:208) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.segment.local.realtime.impl.forward.FixedByteSVMutableForwardIndex.<init>(FixedByteSVMutableForwardIndex.java:77) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.<init>(MutableSegmentImpl.java:308) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.<init>(LLRealtimeSegmentDataManager.java:1364) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:344) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:162) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:86) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
        ... 12 more

i have tried increasing heap size (right now at 16G) and i am still running into this issue. i am using 5 servers to consume from a topic with 128 partitions, with an event rate of about 7M events per minute. I see 26 segments on 3 servers and 25 on 2 servers in Bad state.

Yeongju Kang

11/23/2021, 8:06 AM

Hi, I have some questions related to indices. 1. Is Forward index applied to all non specified columns of a table in default? 2. Are there ways to see query execution plans including index usage? I tried from explain, explain (type distributed), explain (type io) from presto but failed to find useful information from that 3. Some index files doesn’t seem to be purged after table deletion. Should I delete those myself if I have to make table with same name? (I didn’t try re-generation of the behavior) Thank you for your effort to such a nice software!

Yeongju Kang

11/23/2021, 10:31 AM

Hi, I have 5 servers(v0.9) in my cluster and one of them turned to dead state. server’s process never printed crash log and last lines of the server log looks like its state seems okay. but table state at zookeeper turned to offline and i cannot see my node from liveinstances. I am running my server on EKS and it never had pod restart. Is there anything I can do before pod restart?

Ali Atıl

11/23/2021, 2:12 PM

Hello Everyone, is it possible to change H3 index resolution after table creation?

Deepak Mishra

11/23/2021, 6:57 PM

Hello everyone , i am not able to start zookeeper using pinot-0.9.0 with command - bin/pinot-admin.sh StartZookeeper . Please help

Mahesh babu

11/24/2021, 5:17 AM

Hi Team, I'm trying to load data from Minio to pinot but facing issues while running yaml files. ERROR:expected '<document start>', but found BlockMappingStart in 'string', line 6, column 1: jobType: SegmentCreationAndMetad ...

Ayush Kumar Jha

11/24/2021, 6:20 PM

Hi everyone , recently I tried to upgrade pinot version from 0.7.1 and I am doing ingestion using files stored in azure blob but I am getting this error

Copy code

java.lang.IllegalStateException: Unable to extract out the relative path for input file file path "file path"

in 0.8.0 and 0.9.0 but it is working fine in 0.7.1

Vibhor Jain

11/25/2021, 6:18 AM

Hi Team, We are facing one issue in the query where one of our col values contains single quotes and we are getting CalciteSqlToPinotQuery exception. For example, one of the values is L'hôp Test. Since it contains a single quote, how to handle this and query the data for this value?

Mahesh babu

11/25/2021, 11:33 AM

Hi Team, I'm not able to run controller and server of pinot in docker with config file ,i have to read data from s3 for that i have to run server and controller with config file .i'm trying to run the server and controller by using this commands.sudo docker run --rm -ti \ --network=pinot-demo \ --name pinot-controller \ -p 9010:9010 \ -e JAVA_OPTS="-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log" \ -d apachepinot/pinot:latest StartController -configFileName "/home/mahesh/Documents/pinot/s3-pinot/controller.conf" \ -zkAddress pinot-zookeeper:2181

Ali Atıl

11/25/2021, 2:39 PM

Hi, when upsert functionality enabled, would it update the records in the segments of the offline table for hybrid tables?

Prashanth Rao

11/26/2021, 10:58 AM

Hi Team, we are using Apache Pinot for running OLAP queries and right now one of the table is stuck in consumer rebalance state(while pointing to a kafka topic) for last 12 hours. I tried restarting the Pinot Servers which didn't help . Can someone please suggest any steps here . These messages came in repeatedly

Copy code

[Consumer clientId=consumer-2, groupId=event_template_mapping_REALTIME_1627646764788_0] Group coordinator <*> (id: 795314267 rack: null) is unavailable or invalid, will attempt rediscovery
[Consumer clientId=consumer-2, groupId=event_template_mapping_REALTIME_1627646764788_0] Discovered group coordinator <*> (id: 795314267 rack: null)
[Consumer clientId=consumer-2, groupId=event_template_mapping_REALTIME_1627646764788_0] (Re-)joining group

And finally after 6-7 hours saw this message , which basically didn't fetch any partition .

Copy code

[Consumer clientId=consumer-4, groupId=event_template_mapping_REALTIME_1627646764788_0] Successfully joined group with generation 6
[Consumer clientId=consumer-4, groupId=event_template_mapping_REALTIME_1627646764788_0] Setting newly assigned partitions []

Map

11/29/2021, 5:23 PM

We run Pinot 0.8.0. When ingesting a table in

FULL

upsert

mode, we notice the number of rows returned for the same query varies across times, but it is supposed to remain consistent. For example, there are 1000 unique values keyed on column

, which we use as the primary key for the pinot table

table1

. A query like

select count(1) from table1

can return values 1567, or 789, in addition to 1000. In the case of 2000, you can find duplicated rows with different timestamps such as

Copy code

| A | currenttime |
| - | ------------ |
| a | 1:00:00 |
| a | 1:00:01 |
| b | 1:00:00 |
| b | 1:00:03 |
...

In the case of 789, many rows are simply missing… We suspect this is related to the process of updating the index for the upserted table. Have anyone seen this before?

Anusha

11/30/2021, 3:01 AM

Hello Team, I see that the new version 0.9.0 is released. I am trying to enable authentication but I am unable to.. Could someone please guide me. Is there any documentation available for that ? Thanks in advance.

yelim yu

11/30/2021, 5:08 AM

Hello team, My team wants to make a table which has two 3 timestamp cols and few other string cols. While we construct this table schema, when we added two timestamp cols (unixtime milli) in dimension spec, topic couldnt consume the event. Could you please give us reason why?

eywek

11/30/2021, 9:24 AM

Hello, I was wondering if it was planned to add the

LIKE

operator to

JSON_MATCH

? I’m currently using

Copy code

REGEXP_LIKE(JSONEXTRACTSCALAR("labels", '$.demande_intention', 'STRING'), 'terminal')

but it’s very slow (even with small number of scanned documents (21). And maybe having it directly with

JSON_MATCH

could speed-up this operation?

Copy code

JSON_MATCH("labels", 'demande_intention LIKE ''terminal''')

Thank you

Anish Nair

11/30/2021, 3:20 PM

Hi Team, I was trying out "comparisonColumn" config of upsertConfig, it seems like table config is not accepting this config. after updating the table config, config is still the same like below. "upsertConfig": { "mode": "FULL" }, Also I tried pushing old transaction date time record into real-time, and it got updated with that new record. Which shouldn't have. Can someone please help?

Mahesh babu

12/01/2021, 7:08 AM

Hi Team, i'm trying to connect minio through apache pinot when i'm trying to run yml files it is failing with this error. "*ERROR [LaunchDataIngestionJobCommand] [main] Got exception to kick off standalone data ingestion job -* java.lang.RuntimeException: software.amazon.awssdk.core.exception.SdkClientException: Configured region (localhost%3A9010) resulted in an invalid URI: https://s3.localhost%3A9010.amazonaws.com Valid region examples: " i started controller and server with controller.conf and server.conf and my yml file is "executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' jobType: SegmentCreationAndUriPush inputDirURI: 'http://localhost:39391/buckets/test/' *includeFileNamePattern: 'glob:**/*.csv'* outputDirURI: 'http://localhost:39391/buckets/pinot-output/' overwriteOutput: true pinotFSSpecs: - scheme: http className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'localhost:9010' recordReaderSpec: dataFormat: 'csv' className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader' configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig' tableSpec: tableName: 'transcript' schemaURI: 'http://localhost:9000/tables/transcript/schema' tableConfigURI: 'http://localhost:9000/tables/transcript/' pinotClusterSpecs: *- controllerURI: 'http://localhost:9000'*"

Syed Akram

12/01/2021, 11:26 AM

hi, is there any way to set query timeout parameter from jdbc/java-client? if query is taking more than 10sec

Map

12/01/2021, 10:25 PM

When running the realtimeProvisioningHelper, we got a bunch of NAs. Any idea on how to troubleshoot this?

Copy code

RealtimeProvisioningHelper -tableConfigFile <tableConfig> -numPartitions 1 -pushFrequency null -numHosts 1,2,3,4 -numHours 1,2,3,4,56,12,18,24 -sampleCompletedSegmentDir <path-to-segment> -ingestionRate 1000 -maxUsableHostMemory 5120G -retentionHours 1
Note:

* Table retention and push frequency ignored for determining retentionHours since it is specified in command
* See <https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime>
Memory used per host (Active/Mapped)

numHosts --> 1 |2 |3 |4 |
numHours
1 --------> 8.1G/295.67G |4.05G/147.83G |4.05G/147.83G |4.05G/147.83G |
2 --------> NA |NA |NA |NA |
3 --------> NA |NA |NA |NA |
4 --------> NA |NA |NA |NA |
12 --------> NA |NA |NA |NA |
18 --------> NA |NA |NA |NA |
24 --------> NA |NA |NA |NA |
56 --------> NA |NA |NA |NA |

Optimal segment size

numHosts --> 1 |2 |3 |4 |
numHours
1 --------> 1.51G |1.51G |1.51G |1.51G |
2 --------> NA |NA |NA |NA |
3 --------> NA |NA |NA |NA |
4 --------> NA |NA |NA |NA |
12 --------> NA |NA |NA |NA |
18 --------> NA |NA |NA |NA |
24 --------> NA |NA |NA |NA |
56 --------> NA |NA |NA |NA |

Consuming memory

numHosts --> 1 |2 |3 |4 |
numHours
1 --------> 8.1G |4.05G |4.05G |4.05G |
2 --------> NA |NA |NA |NA |
3 --------> NA |NA |NA |NA |
4 --------> NA |NA |NA |NA |
12 --------> NA |NA |NA |NA |
18 --------> NA |NA |NA |NA |
24 --------> NA |NA |NA |NA |
56 --------> NA |NA |NA |NA |

Total number of segments queried per host (for all partitions)
numHosts --> 1 |2 |3 |4 |
numHours
1 --------> 2 |1 |1 |1 |
2 --------> NA |NA |NA |NA |
3 --------> NA |NA |NA |NA |
4 --------> NA |NA |NA |NA |
12 --------> NA |NA |NA |NA |
18 --------> NA |NA |NA |NA |
24 --------> NA |NA |NA |NA |
56 --------> NA |NA |NA |NA |

Map

12/02/2021, 3:20 AM

When query Pinot via Trino (362), the avg() function doesn’t seem to work correctly. It always returns no data…

Syed Akram

12/02/2021, 7:01 AM

is it possible to create segment file name with date in the filename, instead of time in millis(long)... for eg.,testtable_OFFLINE_1637625600000_1637712000000_1469.tar.gz to testtable_OFFLINE_2021-11-01_2021-11-05_1.tar.gz

Yeongju Kang

12/02/2021, 7:35 AM

Hello folks, I am struggling with hybrid table but i have trouble to make it work. My configs are like below. Offline table data is only displayed without streaming data blended. I could find log of consuming kafka events but couldn’t find broker, controller or server error. My hybrid table creation testing is running on minikube, with pinot 0.9.0. Process I did was creating offline, and then realtime. • bin/pinot-admin.sh AddTable -tableConfigFile hybrid_realtime.json -schemaFile hybrid_schema.json -exec • bin/pinot-admin.sh AddTable -tableConfigFile hybrid_offline.json -schemaFile hybrid_schema.json -exec 1. hybrid_schema.json

Copy code

{
  "schemaName": "transcript",
  "dimensionFieldSpecs": [
    {
      "name": "studentID",
      "dataType": "INT"
    },
    {
      "name": "firstName",
      "dataType": "STRING"
    },
    {
      "name": "lastName",
      "dataType": "STRING"
    },
    {
      "name": "gender",
      "dataType": "STRING"
    },
    {
      "name": "subject",
      "dataType": "STRING"
    },
    {
      "name": "doNotFailPlease",
      "dataType": "STRING",
      "defaultNullValue": ""
    },
    {
      "name": "ts2",
      "dataType": "TIMESTAMP"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "score",
      "dataType": "FLOAT"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "ts",
      "dataType": "TIMESTAMP",
      "format": "1:SECONDS:EPOCH",
      "granularity": "1:SECONDS"
    }
  ],
  "primaryKeyColumns": [
    "studentID"
  ]
}

2. hybrid_offline.json

Copy code

{
    "tableName": "transcript_hybrid",
    "tableType": "OFFLINE",
    "segmentsConfig": {
        "replication": 1,
        "timeColumnName": "ts",
        "timeType": "SECONDS"
    },
    "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant"
    },
    "tableIndexConfig": {
        "loadMode": "MMAP"
    },
    "metadata": {}

3. hybrid_realtime.json

Copy code

{
  "tableName": "transcript_hybrid",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "ts",
    "timeType": "SECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "transcript",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "kafka.local-pinot.svc.cluster.local:9092",
      "realtime.segment.flush.threshold.time": "6h"
    }
  },
  "metadata": {
    "customConfigs": {}
  },
  "routing": {
    "instanceSelectorType": "strictReplicaGroup"
  },
  "upsertConfig": {
    "mode": "FULL"
  }
}

Deepak Mishra

12/02/2021, 9:46 AM

Hello everyone , I am working backfill data using spark batch ingestion , can we handle duplicate data while backfill , so that it won’t get duplicated in OFFLINE table

Elon

12/02/2021, 4:41 PM

Question about replica groups and pools for a realtime table: If we set the replicas per partition to 1 in the segment config, and in the instance config set numReplicaGroups to 1 but have 3 pools, do the segments in the table end up having 3 replicas? i.e. 1 per pool?