https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • r

    Rishika

    07/11/2025, 5:29 PM
    Hello, I'm new to Apache Pinot. I'm trying to run it locally, and ingest data from Kafka topic into Pinot. I tried setting up Schema and TableConfig. But when I run LaunchDataIngestionJob . It fails.
    m
    a
    k
    • 4
    • 16
  • l

    Luis P Fernandes

    07/14/2025, 3:50 PM
    Hi Guys, We are tring to set a cold storage on our pinot cluster backed up by S3. In order to setup hot/cold storage for Pinot, and use S3 for cold storage the configurationsfor server.conf, controller.conf, broker.conf are attached at the end as well as the used schema and table config. We observed that the controler uses the S3 configuration as expected (controller.data.dir=s3://storage/controller) and uses the identified bucket as storage. But the server created a local folder that creates the following path: /s3:/cold-storage/server/tiered_REALTIME If anyone has any comments or any help on how we can fix this issue since we are uable to have the sgments moving to S3. Server: pinot.zk.server=localhost:2191 server.helix.cluster.name=PinotCluster pinot.server.netty.port=18098 pinot.server.netty.host=localhost pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.endpoint=http://localhost:9000 pinot.server.storage.factory.s3.accessKey=minioadmin pinot.server.storage.factory.s3.secretKey=minioadmin pinot.server.storage.factory.s3.region=us-east-1 pinot.server.storage.factory.s3.enableS3A=false pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.instance.tierConfigs.tierNames=hotTier,coldTier pinot.server.instance.segment.directory.loader=tierBased pinot.server.instance.dataDir=/Shared/pinot_data/server pinot.server.instance.tierConfigs.hotTier.dataDir=s3://hot-storage/server pinot.server.instance.tierConfigs.coldTier.dataDir=s3://cold-storage/server Controller: pinot.zk.server=localhost:2191 controller.helix.cluster.name=PinotCluster controller.port=19000 controller.host=localhost controller.tls.client.auth=false controller.segment.relocator.frequencyPeriod=60s controller.segmentRelocator.initialDelayInSeconds=10 controller.segmentRelocator.enableLocalTierMigration=true controller.enable.split.commit=true pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.endpoint=http://localhost:9000 pinot.controller.storage.factory.s3.accessKey=minioadmin pinot.controller.storage.factory.s3.secretKey=minioadmin pinot.controller.storage.factory.s3.region=us-east-1 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.controller.segment.fetcher.protocols=file,http,s3 controller.data.dir=s3://storage/controller controller.local.temp.dir=/Shared/pinot_data/controller/temp Broker: pinot.zk.server=localhost:2191 broker.helix.cluster.name=PinotCluster broker.helix.port=18099 pinot.broker.hostname=localhost pinot.broker.client.queryPort=18099 table_config:
    Copy code
    {
      "tableName": "tiered",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "minimizeDataMovement": false,
        "timeColumnName": "timestamp",
        "timeType": "MILLISECONDS",
        "replicasPerPartition": "1",
        "schemaName": "tiered",
        "replication": "2"
      },
      "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant",
        "tagOverrideConfig": {}
      },
      "tableIndexConfig": {
        "autoGeneratedInvertedIndex": false,
        "createInvertedIndexDuringSegmentGeneration": false,
        "loadMode": "MMAP",
        "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.topic.name": "tiered",
          "stream.kafka.broker.list": "localhost:19092",
          "stream.kafka.consumer.type": "lowlevel",
          "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
          "realtime.segment.flush.threshold.rows": "0",
          "realtime.segment.flush.threshold.segment.rows": "0",
          "realtime.segment.flush.threshold.time": "1m",
          "realtime.segment.flush.threshold.segment.size": "100M"
        },
        "enableDefaultStarTree": false,
        "enableDynamicStarTreeCreation": false,
        "aggregateMetrics": false,
        "nullHandlingEnabled": false,
        "columnMajorSegmentBuilderEnabled": true,
        "optimizeDictionary": false,
        "optimizeDictionaryForMetrics": false,
        "optimizeDictionaryType": false,
        "noDictionarySizeRatioThreshold": 0.85,
        "rangeIndexVersion": 2,
        "invertedIndexColumns": [],
        "noDictionaryColumns": [],
        "bloomFilterColumns": [],
        "onHeapDictionaryColumns": [],
        "rangeIndexColumns": [],
        "sortedColumn": [],
        "varLengthDictionaryColumns": []
      },
      "quota": {},
      "query": {},
      "ingestionConfig": {
        "continueOnError": false,
        "rowTimeValueCheck": false,
        "segmentTimeValueCheck": true
      },
      "tierConfigs": [
        {
          "name": "hotTier",
          "segmentSelectorType": "time",
          "segmentAge": "1m",
          "storageType": "pinot_server",
          "serverTag": "DefaultTenant_OFFLINE"
        },
        {
          "name": "coldTier",
          "segmentSelectorType": "time",
          "segmentAge": "10m",
          "storageType": "pinot_server",
          "serverTag": "DefaultTenant_OFFLINE"
        }
      ]
    }
    Table_Schema: { "schemaName": "tiered", "enableColumnBasedNullHandling": true, "dimensionFieldSpecs": [ { "name": "product_name", "dataType": "STRING", "notNull": true } ], "metricFieldSpecs": [ { "name": "price", "dataType": "LONG", "notNull": false } ], "dateTimeFieldSpecs": [ { "name": "timestamp", "dataType": "TIMESTAMP", "format": "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" } ] }
    m
    k
    a
    • 4
    • 21
  • f

    Felipe

    07/16/2025, 9:48 AM
    Hi all, I'm seeing this message in some instances of my servers:
    [PerQueryCPUMemAccountantFactory$PerQueryCPUMemResourceUsageAccountant] [CPUMemThreadAccountant] Heap used bytes 6301800816 exceeds critical level 6184752768
    are there any configuration that I can increase the heap size, or this shouldn't be happening at all??
  • f

    Felipe

    07/16/2025, 9:49 AM
    ah, found it 😄
  • m

    Monika reddy

    07/16/2025, 5:25 PM
    Hello @Mayank @Kishore G I wrote a simple java class to connect Pinot cluster locally, I am able to Get table config, able to call Post APIs pause and resume tables, however while updating the table config using PUT Api getting java.util.concurrent.Timeoutexception Has anyone reported this behaviour? Raised startree ticket
    m
    • 2
    • 11
  • k

    Kiril Kalchev

    07/16/2025, 7:41 PM
    Hello guys, We are using Pinot 1.1 and we are currently investigating an issue that we have for 3rd time for the last 2 weeks. We are getting a lot of these error messages: INFO 2025-07-16T183117.674888458Z [resource.labels.containerName: server] 2025/07/16 183117.672 ERROR [KafkaPartitionLevelConnectionHandler] [auctionsStatsRedis__5__0__20250619T0844Z] Caught exception while creating Kafka consumer, giving up ERROR [RealtimeSegmentDataManager_auctionsNew__6__4__20250614T0632Z] [auctionsNew__6__4__20250614T0632Z] Exception while in work [NetworkClient] [auctionsStatsRedis__1__22__20250716T1027Z] [Consumer clientId=auctionsStatsRedis_REALTIME-auctionsStatsRedis-1, groupId=null] Error connecting to node events-prod-cluster-kafka-0.events-prod-cluster-kafka-brokers.kafka-prod.svc:9092 (id: 0 rack: null) 2025/07/16 183137.710 ERROR [ServerSegmentCompletionProtocolHandler] [customKeys_2025_07_5d6254cd_c8e8_423d_b196_73f016e023cb__7__0__20250625T1419Z] Could not send request http://pinot-prod-controller-1.pinot-prod-controller-headless.pinot.svc.cluster.local:9000/segmentStoppedConsuming?reason=org.apache.pinot.shaded.org.apache.kafka.common.KafkaException&streamPartitionMsgOffset=0&instance=Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098&offset=-1&name=customKeys_2025_07_5d6254cd_c8e8_423d_b196_73f016e023cb__7__0__20250625T1419Z And after that some tables are missing segments. Right now all our currently running tables lost all their segments and can't be queried. Do you have any ideas what is going on and why?
    m
    • 2
    • 23
  • y

    Yeshwanth

    07/17/2025, 7:30 AM
    Hey Guys, Seeing this error during pinot server and broker startup
    Copy code
    Error occurred during initialization of VM
    agent library failed to init: instrument
    Error opening zip file or JAR manifest missing : /opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent.jar
    I can see a similar issue was reported here - https://github.com/apache/pinot/issues/16283 I don't think the fix was applied to this tag -> https://hub.docker.com/layers/apachepinot/pinot/1.3.0/images/sha256-27d64d558cd8a90efdf2c15d92dfd713b173120606942fd6faef9b19d20ec2dd Can someone pls look into this ?
    m
    x
    r
    • 4
    • 12
  • r

    Ricardo Machado

    07/17/2025, 3:41 PM
    Hi, We are trying to read a table from Pinot into spark using the pinot-spark-connector (version 1.3.0), and we get an error message when the number of columns to get is large (occurs roughly around 175 - 180 columns for our tests). The error does not happen for different number of columns depending on the table. Caused by: org.apache.pinot.connector.spark.common.HttpStatusCodeException: Got error status code '400' with reason 'Bad Request' Stack trace*:*
    Copy code
    An error occurred while calling o4276.count. : org.apache.pinot.connector.spark.common.PinotException: An error occurred while getting routing table for query, '<REDACTED' at org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTableForQuery(PinotClusterClient.scala:208) at org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTable(PinotClusterClient.scala:153) at org.apache.pinot.connector.spark.v3.datasource.PinotScan.planInputPartitions(PinotScan.scala:57) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions$lzycompute(BatchScanExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions(BatchScanExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:179) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:175) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:39) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:658) at org.apache.spark.sql.execution.QueryExecution.$anonfun$getSparkPlan$1(QueryExecution.scala:195) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:714) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276) at org.apache.spark.sql.execution.QueryExecution.getSparkPlan(QueryExecution.scala:195) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:187) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:187) at org.apache.spark.sql.execution.QueryExecution.$anonfun$getExecutedPlan$1(QueryExecution.scala:211) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:714) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276) at org.apache.spark.sql.execution.QueryExecution.getExecutedPlan(QueryExecution.scala:208) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:203) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:203) at org.apache.spark.sql.execution.QueryExecution.$anonfun$writeProcessedPlans$10(QueryExecution.scala:417) at org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:747) at org.apache.spark.sql.execution.QueryExecution.writeProcessedPlans(QueryExecution.scala:417) at org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:393) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:432) at <http://org.apache.spark.sql.execution.QueryExecution.org|org.apache.spark.sql.execution.QueryExecution.org>$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:333) at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:311) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:146) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4390) at org.apache.spark.sql.Dataset.count(Dataset.scala:3661) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: org.apache.pinot.connector.spark.common.HttpStatusCodeException: Got error status code '400' with reason 'Bad Request' at org.apache.pinot.connector.spark.common.HttpUtils$.executeRequest(HttpUtils.scala:66) at org.apache.pinot.connector.spark.common.HttpUtils$.sendGetRequest(HttpUtils.scala:50) at org.apache.pinot.connector.spark.common.PinotClusterClient$.$anonfun$getRoutingTableForQuery$1(PinotClusterClient.scala:199) at scala.util.Try$.apply(Try.scala:213) at org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTableForQuery(PinotClusterClient.scala:196)
    m
    l
    • 3
    • 3
  • v

    Victor Bivolaru

    07/18/2025, 1:40 PM
    Hello everyone, I am new here. I've just started a pinot cluster locally - no docker just running the sserver, controller, etc using the scripts inside the bin directory. I am having trouble setting up grafana and prometheus to scrape some metrics off of the cluster. I can find nothing about observability except for the wiki page - caveat is I won't be running it in k8s
    m
    • 2
    • 6
  • k

    Kiril Kalchev

    07/18/2025, 9:10 PM
    Hi everyone. I believe I have an issue in the cluster but I am not sure. I am getting few segments like this in Zookeeper.
    Copy code
    "auctionsStats__6__13__20250703T1233Z": {
          "Server_pinot-prod-server-0.pinot-prod-server-headless.pinot.svc.cluster.local_8098": "OFFLINE",
          "Server_pinot-prod-server-1.pinot-prod-server-headless.pinot.svc.cluster.local_8098": "OFFLINE",
          "Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098": "OFFLINE"
        },
    When I try to download the segments again, I get an error saying they are not in my deepstore. However, queries seem to work normally. Is it expected for segments to be reported as offline and missing in deepstore? What exactly does offline mean as a segment status? Bellow are the latest messages for the above segment:
    Copy code
    INFO 2025-07-18T05:35:05.820609035Z [resource.labels.containerName: server] 2025/07/18 05:35:05.820 INFO [HttpClient] [auctionsStats__6__13__20250703T1233Z] Sending request: <http://pinot-prod-controller-1.pinot-prod-controller-headless.pinot.svc.cluster.local:9000/segmentStoppedConsuming?reason=org.apache.pinot.shaded.org.apache.kafka.common.KafkaException&streamPartitionMsgOffset=0&instance=Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098&offset=-1&name=auctionsStats__6__13__20250703T1233Z> to controller: pinot-prod-controller-1.pinot-prod-controller-headless.pinot.svc.cluster.local, version: Unknown
    INFO 2025-07-18T05:35:05.821542868Z [resource.labels.containerName: server] 2025/07/18 05:35:05.821 INFO [ServerSegmentCompletionProtocolHandler] [auctionsStats__6__13__20250703T1233Z] Controller response {"status":"PROCESSED","streamPartitionMsgOffset":null,"isSplitCommitType":true,"buildTimeSec":-1} for <http://pinot-prod-controller-1.pinot-prod-controller-headless.pinot.svc.cluster.local:9000/segmentStoppedConsuming?reason=org.apache.pinot.shaded.org.apache.kafka.common.KafkaException&streamPartitionMsgOffset=0&instance=Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098&offset=-1&name=auctionsStats__6__13__20250703T1233Z>
    INFO 2025-07-18T05:35:05.821571462Z [resource.labels.containerName: server] 2025/07/18 05:35:05.821 INFO [RealtimeSegmentDataManager_auctionsStats__6__13__20250703T1233Z] [auctionsStats__6__13__20250703T1233Z] Got response {"status":"PROCESSED","streamPartitionMsgOffset":null,"isSplitCommitType":true,"buildTimeSec":-1}
    INFO 2025-07-18T05:35:05.983729827Z [resource.labels.containerName: server] 2025/07/18 05:35:05.976 INFO [local_8098 - SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread_7] SegmentOnlineOfflineStateModel.onBecomeOfflineFromConsuming() : ZnRecord=cc787368-9a93-42f3-8588-ebefe88f2a07, {CREATE_TIMESTAMP=1752816905933, ClusterEventName=IdealStateChange, EXECUTE_START_TIMESTAMP=1752816905976, EXE_SESSION_ID=300627ec087008e, FROM_STATE=CONSUMING, MSG_ID=cc787368-9a93-42f3-8588-ebefe88f2a07, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=auctionsStats__6__13__20250703T1233Z, READ_TIMESTAMP=1752816905959, RESOURCE_NAME=auctionsStats_REALTIME, RESOURCE_TAG=auctionsStats_REALTIME, RETRY_COUNT=3, SRC_NAME=pinot-prod-controller-2.pinot-prod-controller-headless.pinot.svc.cluster.local_9000, SRC_SESSION_ID=2006281fc800087, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098, TGT_SESSION_ID=300627ec087008e, TO_STATE=OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1752816905946, _modifiedTime=1752816905946, _ephemeralOwner=0}
    INFO 2025-07-18T05:35:05.984995178Z [resource.labels.containerName: server] 2025/07/18 05:35:05.983 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread_7] Removing segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
    INFO 2025-07-18T05:35:05.985038958Z [resource.labels.containerName: server] 2025/07/18 05:35:05.983 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Removing segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
    INFO 2025-07-18T05:35:05.985045952Z [resource.labels.containerName: server] 2025/07/18 05:35:05.983 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Closing segment: auctionsStats__6__13__20250703T1233Z of table: auctionsStats_REALTIME
    INFO 2025-07-18T05:35:05.985110098Z [resource.labels.containerName: server] 2025/07/18 05:35:05.984 INFO [MutableSegmentImpl_auctionsStats__6__13__20250703T1233Z_auctionsStats] [HelixTaskExecutor-message_handle_thread_7] Trying to close RealtimeSegmentImpl : auctionsStats__6__13__20250703T1233Z
    INFO 2025-07-18T05:35:05.985117081Z [resource.labels.containerName: server] 2025/07/18 05:35:05.984 INFO [auctionsStats_REALTIME-6-ConcurrentMapPartitionUpsertMetadataManager] [HelixTaskExecutor-message_handle_thread_7] Skip removing untracked (replaced or empty) segment: auctionsStats__6__13__20250703T1233Z
    INFO 2025-07-18T05:35:05.987557288Z [resource.labels.containerName: server] 2025/07/18 05:35:05.987 INFO [MmapMemoryManager] [HelixTaskExecutor-message_handle_thread_7] Deleted file /var/pinot/server/data/index/auctionsStats_REALTIME/consumers/auctionsStats__6__13__20250703T1233Z.0
    INFO 2025-07-18T05:35:05.990545309Z [resource.labels.containerName: server] 2025/07/18 05:35:05.990 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Closed segment: auctionsStats__6__13__20250703T1233Z of table: auctionsStats_REALTIME
    INFO 2025-07-18T05:35:05.990570191Z [resource.labels.containerName: server] 2025/07/18 05:35:05.990 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Removed segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
    INFO 2025-07-18T05:35:05.990578459Z [resource.labels.containerName: server] 2025/07/18 05:35:05.990 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread_7] Removed segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
    INFO 2025-07-18T06:15:57.880369560Z [resource.labels.containerName: controller] 2025/07/18 06:15:57.880 INFO [PinotLLCRealtimeSegmentManager] [pool-10-thread-7] Repairing segment: auctionsStats__6__13__20250703T1233Z which is OFFLINE for all instances in IdealState
    m
    • 2
    • 18
  • m

    madhulika

    07/21/2025, 4:19 PM
    👋 Hello, team!
  • m

    madhulika

    07/21/2025, 4:21 PM
    Hi @Mayank I am trying to connect pinot production environment to tableau desktop and I am getting failed connection error. I was able to connect with local pinot environment and when I use startree connector it asks for certificates and I am not sure how to pass it using startree or other jdbc connector. Can you share valid documentation for such use cases?
    m
    m
    k
    • 4
    • 26
  • k

    Krupa

    07/24/2025, 5:20 PM
    Hello guys, This is my table { "REALTIME": { "tableName": "transaction_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "minimizeDataMovement": false, "timeColumnName": "updated_at", "schemaName": "transaction", "replication": "1" }, "tenants": { "broker": "cp_frm", "server": "cp_frm" }, "tableIndexConfig": { "rangeIndexVersion": 2, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "loadMode": "MMAP", "streamConfigs": { "streamType": "pulsar", "stream.pulsar.topic.name": "topic name", "stream.pulsar.bootstrap.servers": pulsar server "stream.pulsar.consumer.prop.auto.offset.reset": "smallest", "stream.pulsar.consumer.type": "lowlevel", "stream.pulsar.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder", "stream.pulsar.consumer.factory.class.name": "schema" }, "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": true, "columnMajorSegmentBuilderEnabled": true, "optimizeDictionary": false, "optimizeDictionaryForMetrics": false, "optimizeDictionaryType": false, "noDictionarySizeRatioThreshold": 0.85 }, "metadata": {}, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "query": { "timeoutMs": 3000 }, "instanceAssignmentConfigMap": { "COMPLETED": { "tagPoolConfig": { "tag": "cp_frm_REALTIME", "poolBased": false, "numPools": 0 }, "replicaGroupPartitionConfig": { "replicaGroupBased": true, "numInstances": 0, "numReplicaGroups": 1, "numInstancesPerReplicaGroup": 0, "numPartitions": 2, "numInstancesPerPartition": 1, "minimizeDataMovement": false, "partitionColumn": "id" }, "partitionSelector": "INSTANCE_REPLICA_GROUP_PARTITION_SELECTOR", "minimizeDataMovement": false }, "CONSUMING": { "tagPoolConfig": { "tag": "cp_frm_REALTIME", "poolBased": false, "numPools": 0 }, "replicaGroupPartitionConfig": { "replicaGroupBased": true, "numInstances": 0, "numReplicaGroups": 1, "numInstancesPerReplicaGroup": 0, "numPartitions": 2, "numInstancesPerPartition": 1, "minimizeDataMovement": false, "partitionColumn": "id" }, "partitionSelector": "INSTANCE_REPLICA_GROUP_PARTITION_SELECTOR", "minimizeDataMovement": false } }, "upsertConfig": { "hashFunction": "NONE", "defaultPartialUpsertStrategy": "OVERWRITE", "upsertViewRefreshIntervalMs": 3000, "partialUpsertStrategies": {}, "enableSnapshot": true, "metadataTTL": 0, "deletedKeysTTL": 0, "enablePreload": true, "consistencyMode": "SNAPSHOT", "newSegmentTrackingTimeMs": 10000, "dropOutOfOrderRecord": false, "enableDeletedKeysCompactionConsistency": false, "allowPartialUpsertConsumptionDuringCommit": false, "mode": "PARTIAL" }, "ingestionConfig": { "continueOnError": false, "rowTimeValueCheck": false, "transformConfigs": [ { "columnName": "updated_at", "transformFunction": "NOW()" } ], "segmentTimeValueCheck": true }, "isDimTable": false } When data gets inserted during consumption, the data is visible. Once the segment gets committed the data is visible only when I put SET skipUpsert=true or when I reloaded segements. Can someone help on why this is happening and the solution it. pinot version - 1.2.0 @Mayank
    m
    c
    • 3
    • 7
  • r

    robert zych

    07/24/2025, 10:14 PM
    SegmentGenerationAndPushTask
    appears to hang when configured to a s3 bucket with many (>80K) files. Besides reducing the number of files in the bucket, what can be done to handle buckets with many files? @Xiaobing
  • m

    Mayank

    07/25/2025, 12:20 AM
    @Manish ^^
    thanks 1
  • m

    Monika reddy

    07/28/2025, 4:12 PM
    Need some help in understanding. I have two observations, if my Kafka datatype is Long and mapping to Pinot Long the conversion happens from UTC to UTC. But if my Pinot data type is Timestamp then Kafka column shows timezone in UTC but Pinot shows in EST? Question: For Long to Timestamp does pinot converting Timezone as well?
    m
    • 2
    • 1
  • j

    Jonathan Baxter

    07/28/2025, 9:05 PM
    Hi all, I have some parquet part files in GCS that I want to import into an OFFLINE table that already uses GCS (same bucket) as its deep store. I think I'm configuring the job spec correctly but I keep getting this error related to Kerberos and hadoop, which I didn't think were supposed to be involved in this process at all. I'm running 1.2.0 and I'll share more details in 🧵
    KerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
    m
    • 2
    • 14
  • v

    Venkat Sai Ram

    07/30/2025, 9:00 AM
    Hi, I need help with accessing json fields in my table. I've added json indexes to these fields but I'm unable to use json extract, json match for this. i also have column which is array of objects. We got these events from big query , want to store in pinot and query here. with json match, im getting no records found even though it is present.
    Copy code
    SELECT
      user_id,
      event_name,
      geo
    FROM events_intraday_20250719
    WHERE JSON_MATCH(
      geo,
      '"$.country"=''India'''
    )
    LIMIT 10;
    got no records found. geo column :
    Copy code
    "{'city': 'Delhi', 'country': 'India', 'continent': 'Asia', 'region': 'Delhi', 'sub_continent': 'Southern Asia', 'metro': '(not set)'}"
    Copy code
    SELECT
      json_extract_scalar(app_info, '$.id',      'STRING', 'null')  AS app_id,
      json_extract_scalar(app_info, '$.version', 'STRING', 'null')  AS app_version
    FROM events_intraday_20250719
    LIMIT 10;
    all null's is the result i got. app_info column :
    Copy code
    "{'id': 'com.aadhan.hixic', 'version': '5.7.6', 'install_store': None, 'firebase_app_id': '1:700940617518:android:4c5cd93d642b6868', 'install_source': 'manual_install'}"
    if I remove the default null, i got
    Copy code
    Error Code: 200 (QueryExecutionError)
    Caught exception while doing operator: class org.apache.pinot.core.operator.query.SelectionOnlyOperator on segment events_intraday_20250719_OFFLINE_1752863401870004_1752949768919000_12: Cannot resolve JSON path on some records. Consider setting a default value.
    Table Config
    Copy code
    {
      "OFFLINE": {
        "tableName": "events_intraday_20250719_OFFLINE",
        "tableType": "OFFLINE",
        "segmentsConfig": {
          "replication": "1",
          "timeColumnName": "event_timestamp",
          "minimizeDataMovement": false
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
        "tableIndexConfig": {
          "aggregateMetrics": false,
          "invertedIndexColumns": [
            "event_name",
            "platform",
            "session_traffic_source_last_click",
            "traffic_source"
          ],
          "nullHandlingEnabled": false,
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "columnMajorSegmentBuilderEnabled": true,
          "skipSegmentPreprocess": false,
          "optimizeDictionary": false,
          "optimizeDictionaryForMetrics": false,
          "optimizeDictionaryType": false,
          "noDictionarySizeRatioThreshold": 0.85,
          "rangeIndexVersion": 2,
          "jsonIndexColumns": [
            "app_info",
            "device",
            "event_params",
            "geo",
            "privacy_info",
            "user_properties"
          ],
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false,
          "loadMode": "MMAP"
        },
        "metadata": {},
        "ingestionConfig": {
          "batchIngestionConfig": {
            "segmentIngestionType": "APPEND",
            "segmentIngestionFrequency": "DAILY",
            "consistentDataPush": false
          },
          "continueOnError": false,
          "retryOnSegmentBuildPrecheckFailure": false,
          "rowTimeValueCheck": false,
          "segmentTimeValueCheck": true
        },
        "isDimTable": false
      }
    }
    can you help me with this. im happy to provide addtional information.
    b
    m
    • 3
    • 6
  • k

    Kavya

    07/30/2025, 12:57 PM
    Hi team @Mayank I am trying to connect pinot (in EKS) to metabase via Trino. Trino is in EC2 and its version is 425. Trino settings is connector.name=pinot # If your Pinot controller's REST port is 8098: pinot.controller-urls=http://pinot-eks.shiprocket.in/ # force all queries through the broker pinot.prefer-broker-queries=true But getting error while querying: Query failed (#20250730_115432_01201_3uisv): Failed communicating with server: http://pinot-pinot-broker-1.pinot-pinot-broker-headless.pinot.svc.cluster.local:8099/query/sql I think this error is because svc.cluster.local is accessible only within EKS. Please let me know how can I resolve this error without upgrading Trino.
  • n

    Nicolas Thiessen

    07/30/2025, 1:56 PM
    Is there a way to configure pinot to delete segments from deepstore after a certain amount of time but only if it's not referenced in the actual table anymore? Ex: "Delete from deepstore every segment file that is more than 1 week old and no longer referenced in the table" We don't want to set retention period on segments as our data doesn't expire, but our deepstore is constantly growing in size as segments are replaced but not deleted from deepstore Wondering if deepstore has to be cleaned up separately
    m
    • 2
    • 2
  • e

    Emerson Lesage

    07/30/2025, 3:48 PM
    Hello, We have our deepstore set up with s3 and was wondering if calling the delete segments endpoint deletes the segment from the table, as well as s3. Thanks!
    m
    f
    • 3
    • 26
  • p

    Praneeth G

    07/31/2025, 6:03 AM
    Hi Team , What is the timeUnit ( SECONDS, DAYS ..) of
    metadataTTL
    in upsertConfig ? According to documentation example it is seconds but it is also mentioned
    Copy code
    Since the metadata TTL is applied on the first comparison column, the time unit of upsert TTL is the same as the first comparison column.
    With below conf , ingestion is failing .
    Copy code
    "segmentsConfig": {
          "schemaName": "agent_task_dimension_v1",
          "retentionTimeUnit": "DAYS",
          "retentionTimeValue": "180",
          "replicasPerPartition": "2",
          "timeType": "DAYS",
          "timeColumnName": "task_created_date" .... }
    
     "upsertConfig": {
          "mode": "FULL",
          "metadataTTL": 648000
    .. }
    Copy code
    java.lang.ClassCastException: null
    pinot-server.log:2025/07/30 23:53:32.481 ERROR [RealtimeSegmentDataManager_agent_task_dimension_v1__16__0__20250730T1823Z] [agent_task_dimension_v1__16__0__20250730T1823Z] Caught exception while indexing the record at offset: 52
    I tried bunch of combinations and it is not due to null fields, seems to be due to timeUnit and metadataTTL combination mismatch.
    p
    • 2
    • 9
  • r

    Rajat

    07/31/2025, 6:58 AM
    @Mayank @Xiang Fu How to fix the bad segment in REALTIME TABLE?
    x
    m
    • 3
    • 8
  • f

    francoisa

    07/31/2025, 3:16 PM
    Hi 😉 Quick question about migration from version 0.12 to 1.0 (or bigger) We are using only REALTIME tables. Plugins works well for both minon and customs UDFs 👍 Schema and ingestion seems okay for all our table 💪 But I wan to check retro-compatibilty with non consuming segment commited. I’ve tried to use /segment or /v2/segments to upload my segment download from dev server v 12.0 to local one 1.0 but controller keep complaining about LLC with :
    New uploaded LLC segment must have start/end offset in the segment metadata
    even by cheating the metada file with
    Copy code
    segment.total.docs = 2
    segment.kafka.topic.name : ressources 
    segment.kafka.partition.id : 0 
    segment.start.offset : 0
    segment.end.offset : 2
    and repacking facing the same issue any idea ? 😇 Or nothing will works like that (I’m maybe a dreamer 😄 ) Thx by the way for the amazing work 😉
    m
    • 2
    • 5
  • m

    madhulika

    08/05/2025, 2:00 PM
    Hi @Mayank We change the topic with less partition for a partial upset table which resulted in duplicate entry in tables even if primary key was defined
    m
    • 2
    • 13
  • a

    Apoorv Upadhyay

    08/06/2025, 9:33 AM
    Hi Team, Facing an issue in realtime pinot table 1.3.0 version (we recently upgraded from 1.0.0). I have many tables in prod cluster ~100 which are working fine, creating segments and ingestion working properly. Particularly this table once oboarded ingestion gets stopped after ingesting say X records , and even if I do
    forceCommit
    segments are not getting committed to deep-store. I could see log line
    _segmentLogger.error("Could not build segment for {}", _segmentNameStr);
    but this is also not coming further when i re-onboared, also there are no errors logs related to creating or pushing segments. attaching table config and schema config. please suggest how can i debug this further
    slack_pino_schslack_pino_tbl
    m
    • 2
    • 4
  • r

    Rajat

    08/06/2025, 12:13 PM
    Hi, After setting up the deepstore why the new records are not coming after this limit of 800000. Please anyone help. I am running this with helm chart in EKS. @Mayank @Xiang Fu please connect need to get it resolved. also I am not able to see the segments on S3.
    m
    r
    x
    • 4
    • 19
  • e

    Emerson Lesage

    08/06/2025, 2:15 PM
    Hello, is using the
    startReplaceSegments
    and
    endReplaceSegments
    endpoints the correct and most robust approach for doing atomic transactions for offline tables? For example, if my table currently contains segments s1, s2, and s3, and I need to do the following in an all or nothing transaction: • Insert new segments s4 and s5. • Replace existing segment s3 with a new segment s6.
    m
    m
    • 3
    • 8
  • r

    Rohini Choudhary

    08/07/2025, 5:45 AM
    Hello team, we want to use Pool-Based Instance Assignment in our Pinot Cluster. Since it requires to add Helix InstanceConfig to each server. Is there any way that we can populate the Helix InstanceConfig via some config at server startup. The reason behind this ask is, we want to avoid any manual task in our prod cluster in case if we are scaling servers or replenishing the existing servers. We could only find below ways to add Helix InstanceConfig: • Via controller UI (manual) • Via pinot-admin.sh (needs to be run after server is up) • REST API (needs to be run after server is up) We could not find any method that can be given as server configuration and it gets picked automatically when server boots up. Please suggest. Also, we have deployed Pinot in k8s using official Helm Chart. And we want to maintain 2 or 3 pools of servers. What would be your recommendation, should we mange each server pool as different statefulset or all should be part of single statefullset only.
    m
    y
    • 3
    • 7
  • s

    Shrusti Patel

    08/07/2025, 4:53 PM
    Hi community, I'm trying to integrate Apache Pinot (running on Amazon EKS) with Amazon MSK (IAM-auth enabled). My goal is to securely consume Kafka topics using IAM-based authentication. Here’s what I’ve tried so far: • Stored my MSK authentication credentials (secret,) as Kubernetes Secrets in EKS. • Mounted these secrets into my Pinot pods as environment variables. • Updated my table config to use port 9096 (SCRAM) and set the following Kafka consumer properties Despite this, Pinot is unable to connect to the MSK broker and fails during table creation when I use broker port as 9096 , but table creation is achieved when I use port 9092. I would really appreciate any guidance: Has anyone successfully set up MSK IAM authentication with Pinot? Thanks in advance for your help!
    m
    • 2
    • 2
1...162163164165166Latest