https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • u

    Utkarsh

    05/23/2023, 10:41 AM
    hi guys, are we allowed to create replica group in realtime tables? for example: what to put in place of
    OFFLINE
    in below mentioned yaml from the official docs for replica group
    Copy code
    // Table config
    {
      ...
      "instanceAssignmentConfigMap": {
        "OFFLINE": {
          ...
          "replicaGroupPartitionConfig": {
            "replicaGroupBased": true,
            "numReplicaGroups": 3,
            "numInstancesPerReplicaGroup": 4
          }
        }
      },
      ...
      "routing": {
        "instanceSelectorType": "replicaGroup"
      },
      ...
    }
    https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#reduce-the-query-fanout-by-exploding-data-replication. Also, I am trying to put partition config as well in my table. This requires me to put below yaml in routing
    Copy code
    "routing": {
        "segmentPrunerTypes": ["partition"]
      },
    but i have routing specifications from replica group as well as shown above. can i place both of them in routing like below?
    Copy code
    "routing": {
        "segmentPrunerTypes": ["partition"],
        "instanceSelectorType": "replicaGroup"
      },
    m
    m
    • 3
    • 91
  • d

    Deena Dhayalan

    05/23/2023, 11:52 AM
    Hi Im building this with java 11 Is there any issue in the build? https://github.com/apache/pinot.githttps://github.com/apache/pinot.git
    m
    • 2
    • 5
  • a

    abhinav wagle

    05/23/2023, 8:43 PM
    Am I missing something obvious here, running into this on pinot-server start :
    Copy code
    Failed to start a Pinot [SERVER] at 1.153 since launch
    java.lang.NumberFormatException: For input string: "[2, 4]"
    	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:?]
    	at java.lang.Integer.parseInt(Integer.java:652) ~[?:?]
    	at java.lang.Integer.valueOf(Integer.java:983) ~[?:?]
    	at org.apache.pinot.spi.env.PropertyConverter.convert(PropertyConverter.java:25) ~[pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.spi.env.PinotConfiguration.getProperty(PinotConfiguration.java:376) ~[pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.spi.env.PinotConfiguration.getProperty(PinotConfiguration.java:324) ~[pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.server.starter.helix.BaseServerStarter.init(BaseServerStarter.java:190) ~[pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:166) ~[pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:97) ~[pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:278) ~[pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:304) [pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:278) [pinot-all-0.12.1-jar-with-dependencies.jar:0.12.1-6e235a4ec2a16006337da04e118a435b5bb8f6d8]
    m
    • 2
    • 2
  • d

    Deena Dhayalan

    05/24/2023, 7:32 AM
    I followed the steps and If I use the command given in the image below I am getting this error How to rectify while setting up a cluster with different machines which is also pingable?
    m
    • 2
    • 8
  • l

    Lee Wei Hern Jason

    05/24/2023, 7:58 AM
    Hello team, i am facing this issue with my Realtime To Offline flow.
    Copy code
    Done executing RealtimeToOfflineSegmentsTask on table: transportSurgeMetric_REALTIME, input segments: transportSurgeMetric__0__2922__20230516T0552Z,transportSurgeMetric__0__3016__20230522T0555Z,transportSurgeMetric__0__3021__20230522T1333Z,transportSurgeMetric__1__2369__20230516T0543Z,transportSurgeMetric__1__2370__20230516T0731Z,transportSurgeMetric__1__2462__20230522T0609Z,transportSurgeMetric__1__2467__20230522T1347Z,transportSurgeMetric__2__2719__20230516T0711Z,transportSurgeMetric__2__2812__20230522T0541Z,transportSurgeMetric__2__2817__20230522T1319Z, output segments:
    The logs show me that the job was successfully but there were no segments in the offline table. From the log above, the output segments is empty. There isnt any error logs in the minion logs.
    m
    • 2
    • 24
  • s

    Sid

    05/24/2023, 8:05 AM
    Hello Team, I added an additional controller and zookeeper to pinot cluster, and since then all the tables are gone. However, I'm seeing all the segment files inside controller data directory. What am i missing here?
    m
    • 2
    • 4
  • d

    Deena Dhayalan

    05/24/2023, 8:32 AM
    With the zookeeper server started seperately and made sure that both are connected , I started controller and it throws this below error sh bin/pinot-admin.sh StartController -zkAddress localhost:5181 -clusterName PinotCluster -controllerPort 9000 Can any one guild me to properly set cluster across the nodes/Machines that Im using?
    m
    • 2
    • 12
  • e

    Ehsan Irshad

    05/24/2023, 12:12 PM
    Hi May I know if there is an API to know if Server / Broker Rebalancing is completed? We are on version 0.11.0
    m
    m
    l
    • 4
    • 13
  • t

    Tanmay Varun

    05/24/2023, 3:42 PM
    Hi team, i had optimized dictionary columns in pinot servers, but now still going into oom, logs below
    Copy code
    Allocating 8 bytes for: reporting_benchmark__23__7__20230524T0026Z:transactionDate.dict
    Consumed 13441 events from (rate:186.97661/s), currentOffset=11369166, numRowsConsumedSoFar=1048158, numRowsIndexedSoFar=1048158
    [3307.243s][warning][gc,alloc] reporting_benchmark__23__7__20230524T0026Z: Retried waiting for GCLocker too often allocating 131085 words
    Exception in thread "reporting_benchmark__23__7__20230524T0026Z" java.lang.OutOfMemoryError: Java heap space
    [3312.561s][warning][gc,alloc] req-rsp-timeout-task: Retried waiting for GCLocker too often allocating 332 words
    [3312.561s][warning][gc,alloc] Log4j2-TF-2-AsyncLoggerConfig-1: Retried waiting for GCLocker too often allocating 332 words
    [3312.959s][warning][gc,alloc] Service Thread: Retried waiting for GCLocker too often allocating 334 words
    [3313.783s][warning][gc,alloc] prometheus-http-1-4: Retried waiting for GCLocker too often allocating 256 words
    [3315.776s][warning][gc,alloc] OpChainSchedulerService: Retried waiting for GCLocker too often allocating 332 words
    [3317.375s][warning][gc,alloc] OpChainSchedulerService: Retried waiting for GCLocker too often allocating 332 words
    [3317.777s][warning][gc,alloc] Log4j2-TF-2-AsyncLoggerConfig-1: Retried waiting for GCLocker too often allocating 4 words
    [3319.731s][warning][gc,alloc] req-rsp-timeout-task: Retried waiting for GCLocker too often allocating 4 words
    [3320.909s][warning][gc,alloc] HttpServer-0: Retried waiting for GCLocker too often allocating 332 words
    [3320.909s][warning][gc,alloc] round-robin-scheduler-release-thread: Retried waiting for GCLocker too often allocating 331 words
    [3321.302s][warning][gc,alloc] Service Thread: Retried waiting for GCLocker too often allocating 6 words
    [3321.302s][warning][gc,alloc] metrics-meter-tick-thread-2: Retried waiting for GCLocker too often allocating 332 words
    [3321.302s][warning][gc,alloc] round-robin-scheduler-release-thread: Retried waiting for GCLocker too often allocating 331 words
    [3321.707s][warning][gc,alloc] Start a Pinot [SERVER]-SendThread(pinot-zookeeper:2181): Retried waiting for GCLocker too often allocating 333 words
    [3322.099s][warning][gc,alloc] metrics-meter-tick-thread-1: Retried waiting for GCLocker too often allocating 332 words
    [3322.494s][warning][gc,alloc] Log4j2-TF-2-AsyncLoggerConfig-1: Retried waiting for GCLocker too often allocating 3 words
    [3323.272s][warning][gc,alloc] prometheus-http-1-4: Retried waiting for GCLocker too often allocating 32 words
    Exception in thread "req-rsp-timeout-task" java.lang.OutOfMemoryError: Java heap space
    Exception in thread "prometheus-http-1-4" java.lang.OutOfMemoryError: Java heap space
    m
    • 2
    • 10
  • h

    Hassan Ait Brik

    05/24/2023, 4:28 PM
    Hi I am facing serious performance problems using pinot with the python module pinotdb. The issue seems related to
    Cursor.fetchall
    A pull requests exists on this subject --> https://github.com/python-pinot-dbapi/pinot-dbapi/pull/62 Can someone do something about this ?
    m
    h
    r
    • 4
    • 10
  • r

    Rekha S

    05/25/2023, 5:10 AM
    Does the complex type handling support Avro Map type? "It's common for ingested data to have a complex structure. For example, Avro schemas have records and arrays and JSON supports objects and arrays." I see that is works with Avro Record and Array.Record type. Is it supposed to also work for Avro Map type?
    m
    • 2
    • 2
  • a

    Abhijeet Kushe

    05/25/2023, 1:53 PM
    Hi Team We have a reatime table running in Upsert integrated with Kinesis.Twice we have seen an issue where a consuming segment on one of the shards did not get completed.I wanted to create an alert on this.Can someone tell me which metric can help me with this https://docs.pinot.apache.org/configuration-reference/monitoring-metrics or do I need to write a script to get this alert ?
    m
    • 2
    • 2
  • y

    Yusuf Külah

    05/26/2023, 6:50 AM
    Hey, what is the good way of monitoring kafka-consumer-group-lag of realtime table consumer? I added
    group.id
    as well as
    stream.kafka.consumer.prop.group.id
    . In kafka-brokers end, I see the consumer group is created but no consumers are registered to that group and total lag of the consumer-group is always increasing. Apparently pinot-server’s are not committing offsets to kafka.
    j
    n
    +2
    • 5
    • 30
  • l

    Lee Wei Hern Jason

    05/26/2023, 10:53 AM
    Hi Team, currently our minion logs contain the authToken, is there a configuration to obfuscate the auth token in the logs ? Is this only available in version 0.12? https://github.com/apache/pinot/issues/7232
    m
    s
    • 3
    • 25
  • r

    Raveendra Yerraguntla

    05/26/2023, 1:47 PM
    Hello team - This is about performance question, I am trying pinot for its sub second responses. In my POC , For the queries with the schema (attached) with 3 GB data of table data on 6 server (5 node n2-standard-8 GCP) , most of queries are from pinot sql manager are taking 3 seconds. My use case is to compare seasonal performance for the queries used and time series charts. Without sub second performance , pinot will not be a choice for consideration. Can any one suggest/help with what needs to be done to achieve the sub second performance.(throwing more compute resources than what is used is not cost effective). cc: @Mayank
    schema.rtftable.rtfPinot SQL Queries.rtf
    m
    • 2
    • 8
  • c

    Chris Han

    05/26/2023, 3:07 PM
    My Pinot Servers (running on k8s) are in a
    DEAD
    state, due to OOM after running a query. The tables and segments are now stuck in an
    UPDATING
    state. What's the appropriate operation to get the tables back to a healthy status? It is rebalance? If so, how can I check the status of a rebalance in real-time?
    m
    m
    • 3
    • 6
  • a

    Apoorv

    05/29/2023, 6:57 AM
    Hello team, Is there a way we can update the existing tableconfig with
    Multiple Comparison columns
    , currently I have a table with no comparison column specified , I would want to update the table config with ``Multiple Comparison columns`` with two timestamp column -epoch milisec, I am using 0.12.1 version of pinot and I have tried this but I am facing below error post updating the comparsion column , can someone help me out the same. Note : Its a partial upsert table and config ref i took from doc : https://docs.pinot.apache.org/v/release-0.12.1-2/basics/data-import/upsert#multiple-comparison-columns
    Copy code
    ava.lang.ClassCastException: class java.lang.Long cannot be cast to class org.apache.pinot.segment.local.upsert.ComparisonColumns (java.lang.Long is in module java.base of loader 'bootstrap'; org.apache.pinot.segment.local.upsert.ComparisonColumns is in unnamed module of loader 'app')
    pinot-server-12       | 	at org.apache.pinot.segment.local.upsert.ComparisonColumns.compareTo(ComparisonColumns.java:22) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-51bf75efa65cbe8bd8b497eb20e34869205d74e8]
    m
    m
    e
    • 4
    • 4
  • p

    Pratik Tibrewal

    05/29/2023, 11:54 AM
    Hey team, we recently added a new multivalue STRING column in one of our Pinot tables. I reloaded all the segments. Now for the older segments the value for this field comes as
    "NULL"
    but when I do a filter of
    select * from tbl where newCol is not null
    the old rows are not getting filtered out and instead returning
    "NULL"
    . We are using Pinot 0.11.
    j
    • 2
    • 3
  • e

    Eaugene Thomas

    05/29/2023, 5:00 PM
    Hey Team , I have a realtime segment metadata like
    Copy code
    "segment_A": {
        "segmentName": "segment_A",
        "schemaName": null,
        "crc": 2295412536,
        "creationTimeMillis": 1685351723281,
        "creationTimeReadable": "2023-05-29T09:15:23:281 UTC",
        "timeColumn": "lastUpdated",
        "timeUnit": "SECONDS",
        "timeGranularitySec": 1,
        "startTimeMillis": 1685091962000,
        "startTimeReadable": "2023-05-26T09:06:02.000Z",
        "endTimeMillis": 1685351702000,
        "endTimeReadable": "2023-05-29T09:15:02.000Z",
        "segmentVersion": "v3",
        "creatorName": null,
        "totalDocs": 2500000,
        "custom": {},
        "startOffset": "124103692911",
        "endOffset": "124106192911",
        "columns": [],
        "indexes": {},
        "star-tree-index": null
      },
    Taking the difference between end and start-time it looks like the segment was in consumption for 3 days before getting committed . However thats not the case , the segment was in consumption less than one day ( for some hours ) , Also we have segment flush time as 86400000 (1day) in segment.flush.threshold.time . Is there any way where I can get for how long the segment was in consumption ?
    j
    s
    m
    • 4
    • 11
  • m

    Matthew Kerian

    05/30/2023, 5:54 PM
    Hi team qq. If we're running a realtime table ingesting data from Kafka. Is there any standard metric to track deduplication?
    j
    • 2
    • 2
  • r

    Ronak

    05/31/2023, 6:09 AM
    We are observing OOM for direct buffer memory while running regexp_like query. We have increased the memory upto 8GB, however, we are still seeing this issue. Looking at the stacktrace, we encounter this comment in
    SVScanDocIdIterator.java
    as well as in this commit - https://github.com/apache/pinot/commit/2d53876f669e34d8536461c52d5fe344ad1d1439. So, could this be a memory leak issue that we are facing? Any suggestion or guidance. Stack trace:
    Copy code
    java.lang.OutOfMemoryError: Direct buffer memory
      at java.nio.Bits.reserveMemory(Unknown Source) ~[?:?]
      at java.nio.DirectByteBuffer.<init>(Unknown Source) ~[?:?]
      at java.nio.ByteBuffer.allocateDirect(Unknown Source) ~[?:?]
      at org.apache.pinot.segment.local.segment.index.readers.forward.ChunkReaderContext.<init>(ChunkReaderContext.java:43) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.createContext(VarByteChunkSVForwardIndexReader.java:54) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.createContext(VarByteChunkSVForwardIndexReader.java:37) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.core.operator.dociditerators.SVScanDocIdIterator.<init>(SVScanDocIdIterator.java:59) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.core.operator.docidsets.SVScanDocIdSet.<init>(SVScanDocIdSet.java:33) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.core.operator.filter.ScanBasedFilterOperator.getNextBlock(ScanBasedFilterOperator.java:56) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.core.operator.filter.ScanBasedFilterOperator.getNextBlock(ScanBasedFilterOperator.java:33) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:43) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.core.operator.filter.AndFilterOperator.getNextBlock(AndFilterOperator.java:53) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
      at org.apache.pinot.core.operator.filter.AndFilterOperator.getNextBlock(AndFilterOperator.java:33) ~[pinot-all-hypertrace-0.12.0-5-shaded.jar:0.12.0-06acc7c10dc7a30a35e713da60fd9516e7efd1be]
    m
    m
    +3
    • 6
    • 7
  • l

    Lvszn Peng

    05/31/2023, 3:21 PM
    hi team, During the query mall_all_hourly_stat_whalet, some non-existent segments were found. Is the retention delete segment ? TableConfig
    Copy code
    {
      "REALTIME": {
        "tableName": "mall_all_hourly_stat_whalet_REALTIME",
        "tableType": "REALTIME",
        "segmentsConfig": {
          "allowNullTimeValue": false,
          "retentionTimeUnit": "DAYS",
          "retentionTimeValue": "7",
          "replicasPerPartition": "1",
          "timeColumnName": "time_to_use",
          "timeType": "SECONDS",
          "schemaName": "mall_all_hourly_stat_whalet",
          "replication": "1"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "businessServerTenant"
        },
        "tableIndexConfig": {
          "invertedIndexColumns": [
            "shop_id"
          ],
          "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.topic.name": "mall-all-hourly-stat",
            "stream.kafka.broker.list": "",
            "stream.kafka.consumer.type": "lowlevel",
            "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
            "realtime.segment.flush.threshold.time": "6h",
            "realtime.segment.flush.threshold.size": "0"
          },
          "loadMode": "MMAP",
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "aggregateMetrics": false,
          "nullHandlingEnabled": false,
          "rangeIndexVersion": 1,
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false
        },
        "metadata": {},
        "routing": {
          "instanceSelectorType": "strictReplicaGroup"
        },
        "upsertConfig": {
          "mode": "FULL",
          "hashFunction": "NONE"
        },
        "ingestionConfig": {},
        "isDimTable": false
      }
    }
    Version 0.9.3 Stack
    Copy code
    request: {"graph_id":1587696375422881792,"interval":{"start_time":1672502400000,"end_time":1685548799999}}
     error: query dataset: query: exception on broker [{235 ServerSegmentMissing:
    2 segments [mall_all_hourly_stat_whalet__1__741__20230524T1417Z, mall_all_hourly_stat_whalet__1__740__20230524T0817Z] missing on server: Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098} {200 QueryExecutionError:
    java.lang.RuntimeException: Caught exception while running CombinePlanNode.
    	at org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:146)
    	at org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:41)
    	at org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)
    	at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:296)
    ...
    Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
    	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
    	at org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:135)
    	... 15 more
    ...
    Caused by: java.lang.NullPointerException} {235 ServerSegmentMissing:
    2 segments [mall_all_hourly_stat_whalet__0__802__20230524T0841Z, mall_all_hourly_stat_whalet__0__803__20230524T1441Z] missing on server: Server_pinot-server-5.pinot-server-headless.pinot.svc.cluster.local_8098} {235 ServerSegmentMissing:
    2 segments [mall_all_hourly_stat_whalet__2__775__20230524T1459Z, mall_all_hourly_stat_whalet__2__774__20230524T0859Z] missing on server: Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098}]
    m
    • 2
    • 2
  • r

    Raveendra Yerraguntla

    05/31/2023, 3:23 PM
    Hello - I have a headerless tsv data file and a header file with all columns names. Is there any pinot readers available for indexing the data? If not any pointers to solve this use case?
    m
    j
    • 3
    • 4
  • s

    Satish Mittal

    05/31/2023, 4:55 PM
    Hi All. I have a quick question regarding native text index, as documented here. Some of our table columns contain arbitrary string payloads and we want to support regex search queries on them. Using
    regexp_like()
    was leading to OOM since it ends up scanning the whole table. So we decided to switch over to native text index. However, after configuring native text index, we observed that pinot-servers heap usage gradually increased, and eventually OOMed. During subsequent server restart, the servers took forever and never became available (typically server would come up in 2-3 minutes, but we waited for more than 1 hour). We tried increasing heap size (6 -> 8 -> 10 GB) per server but no luck. Eventually we had to revert the text index. After couple of iterations, we observed following server logs:
    Copy code
    2023/05/31 07:58:30.470 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_25] Need to create new text index for segment: XXView__8__6969__20230529T1428Z, column: YY
    2023/05/31 07:58:32.953 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_5] Need to create new text index for segment: XXView__8__6962__20230529T0836Z, column: YY
    2023/05/31 07:58:33.876 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_5] Creating new text index for column: YY in segment: XX__8__6962__20230529T0836Z, hasDictionary: false
    2023/05/31 07:58:35.594 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_25] Creating new text index for column: YY in segment: XXView__8__6969__20230529T1428Z, hasDictionary: false
    2023/05/31 07:58:36.594 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_0] Need to create new text index for segment: XXView__8__6999__20230530T1355Z, column: response_body
    2023/05/31 07:58:37.435 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_0] Creating new text index for column: YY in segment: XXView__8__6999__20230530T1355Z, hasDictionary: false
    2023/05/31 07:58:37.803 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_8] Need to create new text index for segment: XXView__8__6992__20230530T0943Z, column: response_body
    2023/05/31 07:58:38.262 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_39] Need to create new text index for segment: XXView__8__6963__20230529T0927Z, column: response_body
    2023/05/31 07:58:39.009 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_8] Creating new text index for column: YY in segment: XXView__8__6992__20230530T0943Z, hasDictionary: false
    2023/05/31 07:58:39.272 INFO [TextIndexHandler] [HelixTaskExecutor-message_handle_thread_39] Creating new text index for column: YY in segment: XXView__8__6963__20230529T0927Z, hasDictionary: false
    Looking at segment times (e.g. XXView__8__6963__20230529T0927Z), it turns out that: 1. The server is trying to create Text Index for all the existing realtime segments also. That explained the need for such a huge memory requirement. Can someone confirm if this is indeed the server behaviour (that it will create text index on all existing segments)? 2. Our expectation is that any new indexes should get applied only on the newer segments that get created after config change. Is there a way to configure this behaviour?
    m
    a
    +2
    • 5
    • 49
  • l

    Lee Wei Hern Jason

    06/01/2023, 4:08 AM
    Hello community, is there a way to speed up minion’s Realtime to Offline task ? My task takes about 20-40 mins and ideally, we want it to run 10+ mins.
    j
    m
    • 3
    • 21
  • a

    Arjun Tyagi

    06/01/2023, 11:46 AM
    Hello Community, I am new to Apache pinot. Having trouble in running a query on a realtime table. I am getting very weird SQL parsing exception for basic queries. PS: I have recently moved to the latest version, in the previous version query was running fine using PQL . But in the new version since PQL is deprecated, i am facing this issue. Attaching image for reference.
    m
    j
    • 3
    • 4
  • s

    Shubham Kumar

    06/01/2023, 12:13 PM
    Hello team, we are trying to connect presto with pinot. We have deployed presto & pinot using the helm charts in apache/pinot git repo. presto-worker ip/port : 192.19.81.10:8080 pinot-broker ip/port : 192.19.83.69:8099 I am querying presto and getting this error. any idea on what could be the possible issue here :
    Copy code
    2023-06-01T11:34:14.363Z ERROR SplitRunner-2-180 com.facebook.presto.execution.executor.TaskExecutor Error processing Split 20230601_113343_00004_6axxb.1.0.0-0 PinotSplit{connectorId=pinot, splitType=BROKER, columnHandle=[PinotColumnHandle{columnName="device_id", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="os", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="session", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="location_latitude", dataType=double, type=REGULAR}, PinotColumnHandle{columnName="advertising_id", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="source", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="manufacturer", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="app_name", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="event_name", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="customer_id", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="network_type", dataType=varchar, type=REGULAR}, PinotColumnHandle{columnName="location_longitude", dataType=double, type=REGULAR}, PinotColumnHandle{columnName="timestamp", dataType=timestamp, type=REGULAR}], segmentPinotQuery=Optional.empty, brokerPinotQuery=Optional[GeneratedPinotQuery{query=SELECT "device_id", "os", "session", "location_latitude", "advertising_id", "source", "manufacturer", "app_name", "event_name", "customer_id", "network_type", "location_longitude", "timestamp" FROM clickstream_janus_dev1 LIMIT 11, table=clickstream_janus_dev1, expectedColumnIndices=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], haveFilter=false, forBroker=true}], segments=[], segmentHost=Optional.empty} (start = 6933404.500563, wall = 30296 ms, cpu = 0 ms, wait = 1 ms, calls = 1)
    java.io.UncheckedIOException: java.io.EOFException: HttpConnectionOverHTTP@6646c5c1::DecryptedEndPoint@4bc06dff{pinot2-broker-1.pinot2-broker-headless.nms.svc.cluster.local/192.19.83.69:8099<->/192.19.81.10:45742,OPEN,fill=-,flush=P,to=30279/300000}
    at com.facebook.airlift.http.client.ResponseHandlerUtils.propagate(ResponseHandlerUtils.java:21)
    at com.facebook.airlift.http.client.StringResponseHandler.handleException(StringResponseHandler.java:51)
    at com.facebook.airlift.http.client.StringResponseHandler.handleException(StringResponseHandler.java:34)
    at com.facebook.airlift.http.client.jetty.JettyHttpClient.execute(JettyHttpClient.java:512)
    at com.facebook.presto.pinot.PinotClusterInfoFetcher.doHttpActionWithHeaders(PinotClusterInfoFetcher.java:183)
    at com.facebook.presto.pinot.PinotBrokerPageSource.lambda$issueQueryAndPopulate$1(PinotBrokerPageSource.java:350)
    at com.facebook.presto.pinot.PinotUtils.doWithRetries(PinotUtils.java:50)
    at com.facebook.presto.pinot.PinotBrokerPageSource.issueQueryAndPopulate(PinotBrokerPageSource.java:335)
    at com.facebook.presto.pinot.PinotBrokerPageSource.getNextPage(PinotBrokerPageSource.java:241)
    at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:266)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:426)
    at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:309)
    at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:730)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:302)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
    at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:166)
    at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
    at com.facebook.presto.$gen.Presto_0_278_SNAPSHOT_cb986f6____20230601_101337_1.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
    Caused by: java.io.EOFException: HttpConnectionOverHTTP@6646c5c1::DecryptedEndPoint@4bc06dff{pinot2-broker-1.pinot2-broker-headless.nms.svc.cluster.local/192.19.83.69:8099<->/192.19.81.10:45742,OPEN,fill=-,flush=P,to=30279/300000}
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.earlyEOF(HttpReceiverOverHTTP.java:338)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1551)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.shutdown(HttpReceiverOverHTTP.java:209)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:147)
    at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:73)
    at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:133)
    at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:155)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:411)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:305)
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
    ... 1 more
    m
    x
    • 3
    • 7
  • s

    Sandeep R

    06/01/2023, 2:36 PM
    Hi Team, Is there way to define Kafka consumer group in pinot table configuration? So we can monitor the consumer group lag for that table.
    m
    • 2
    • 2
  • s

    Sid

    06/02/2023, 11:28 AM
    Hi Team, We recently enabled the deep storage for pinot. However we are seeing tmp files in s3, instead of actual segment files. Could anyone suggest what are we missing here?
    m
    • 2
    • 9
  • d

    Deena Dhayalan

    06/02/2023, 11:49 AM
    Hi Team , Is there any possible to ingest presto orc files for batch ingestion (Offline Data)?
1...818283...166Latest