https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • p

    prasanna

    06/04/2025, 8:15 AM
    hi Team, can someone provide me some context on my query below related to how s3 is involved/behaves when used as a deepstore. my current setup. pinot configure to bypass controller and using se as deepstore so server push and pull data directly from deepstore. i believe their is a config that can be defined (age i guess) in realtime table so realtime segments are still kept on disk. but we are not using this. recent problem faced. recently we had outage and lost connectivity with s3 from pinot which resulted in ingestion being stopped. this seems our mistake as along with bypassing controller we need to configure realtime table to keep ingesting and keeping segments locally while s3 gets back up. my queries. also currently we have intermittent http timeouts with s3. due to this segments external state goes to ERROR and table goes to bad state. while we are working on the network side of the issue i need to understand below. 1> what is the involvement of s3 is it just a deepstore for backup. even with our config does pinot still keep segments on disk. 2> when query is executed does it always go to s3 or still queries data on disk provided pinot keeps it and only for segment not on disk query goes to s3. i am trying to understand how and when s3 is queried. is it used for upload, query or both. below is the error we face currently i need to understand the behavior to setup pinot better. if any document is their related same it will help Caught exception while fetching segment from: s3://xyz/abc/abc__31__73__20250604T0211Z to: /var/pinot/server/data/index/abc_REALTIME/tmp/tmp-abc__31__73__20250604T0211Z-7b64a679-f65a-4325-abc8-956b372fea55/abc__31__73__20250604T0211Z.tar.gz org.apache.pinot.shaded.software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connect to xyz:9020 [xyz/0.0.0.0]
    m
    • 2
    • 4
  • c

    coco

    06/05/2025, 6:34 AM
    Hi. Pinot team. Have you ever experienced a phenomenon in pinot 1.2 where one replica of the consuming segment remains as COINSUMING? The previous consuming segment has 2 ONLINE and 1 CONSUMING replica. The newly created consuming segment becomes under-replica as only 2 CONSUMING segments are created. ex) segment__9__2027__timestampZ { "server1" : "ONLINE", "server2" : "ONLINE", "server3" : "CONSUMING", } segment__9__2028__timestampZ { "server1" : "CONSUMING", "server2" : "CONSUMING" } I've been analyzing this phenomenon for a while, but I can't find the cause.
    x
    m
    +2
    • 5
    • 32
  • r

    Rajat

    06/05/2025, 6:45 AM
    Hi Team, There's a doubt I had, there was some change in the transformation function in the Table config of my Realtime Table, But the changes doesn't reflect on the table. this transform function is an UDF and it is working when I recreated the table, but I want the older table to get refresh with new settings and configs, how can I do it? @Xiang Fu @Jackie I cannot point queries to new table how can I get it refresh with older table
    x
    m
    • 3
    • 21
  • g

    Gaurav

    06/06/2025, 9:31 AM
    Hi Team, We have deployed pinot in our kubernetes infra, is there a way we can expose the pinot controller UI with a sub path , instead of / ? We are using kong to expose pinot controller UI .
    x
    • 2
    • 1
  • g

    Georgi Varbanov

    06/06/2025, 9:45 AM
    Hello i have a the following case, can you tell me what will happen? We are a RealTime table without upsert that has time and partition pruning, during consumption there is a problem with a single record from application POV (record is skipped and not published), now other records with newer timestamps are published, can we then backfill this lost record with new kafka timestamp but old out of order creation time (used for the time pruning), will the somehow break the pruning process or it will manage to fix itself?
    m
    • 2
    • 4
  • v

    Vipin Rohilla

    06/09/2025, 5:14 PM
    Hi team, Any plans to support multiple data dir on a single pinot server instance ? (currently server conf are limited to a single data dir )
    m
    • 2
    • 2
  • e

    Eddie Simeon

    06/09/2025, 7:06 PM
    Hi Team, 1. I have a Pinot Server receiving requests for segments it does not own. The server is reporting exception processing errors and slow query processing time 2. Is it normal in a 3 broker setup where only 2 of the 3 brokers are making requests query requests
    m
    • 2
    • 1
  • n

    Naz Karnasevych

    06/09/2025, 10:50 PM
    Hello, thought i'd post here before continuing further debugging. We're upgrading out Pinot installation and it's been a while so we have to move to the newer bitnami zookeeper chart. I've done the upgrade in our sandbox env and everything seemed to go smoothly, no data loss and components recovered just fine. However, when proceeding to our dev env, post restart and recreation of pods, we seem to have lost important zookeeper data - specifically table and schema configs. I see snapshots in the
    /bitnami/zookeeper/data
    dir, so clearly some things were not lost, but curious if migrating charts requires some extra steps in the config. Also, couple of followup questions: • are there ways to recover the tables/schemas if zookeeper lost them on restart? • how can this happen in one env but not in another? the only difference i can think of is the addition of basic auth to the controller ui • ways to prevent this for the future? we want to update prod as well, but not sure how to prevent this scenario. Backing up tables/schemas manually is one thing, but is there other important data whose loss can prevent a healthy recovery of pinot?
  • a

    Anish Nair

    06/11/2025, 1:10 PM
    Hi team, We have a realtime table in Pinot cluster. We want to migrate the realtime data to another Pinot cluster. is it possible to load the COMPLETED segments into new cluster somehow? Pinot version: 1.0
    m
    x
    • 3
    • 10
  • d

    Dong Zhou

    06/12/2025, 8:12 AM
    Hi team, does Kafka ingestion support zstd decompression?
    m
    • 2
    • 2
  • r

    Richa Kumari

    06/13/2025, 4:46 PM
    Hi team , I am facing issues when trying to enable multistage engine in pinot UI , and the error message is not very clear attaching snippet , looking forward to get help to diagnose the issue . It says table does not exist but it does.
    m
    g
    y
    • 4
    • 18
  • r

    Ross Morrow

    06/14/2025, 7:59 PM
    I'm experimenting and trying to load in data into a realtime table that is very "tall", with only about 10 columns but hundreds of millions of rows. I'm seeing server restarts:
    Copy code
    pod/pinot-server-0                           1/1     Running     0               43h
    pod/pinot-server-1                           1/1     Running     0               43h
    pod/pinot-server-2                           1/1     Running     1 (26s ago)     43h
    pod/pinot-server-3                           1/1     Running     0               43h
    pod/pinot-server-4                           1/1     Running     0               43h
    pod/pinot-server-5                           0/1     OOMKilled   3 (3m56s ago)   43h
    pod/pinot-server-6                           1/1     Running     1 (8m50s ago)   81m
    pod/pinot-server-7                           1/1     Running     1 (7m44s ago)   81m
    The table data itself is not that large, pretty small in fact (at 10GB currently), there are 8 servers with O(60GB) memory each, 100GB PVCs, and a total of maybe 450 segments over 4 tables. But this table is (IIUC) trying to do some pretty high cardinality upserts, easily into the many tens of millions of keys. Could this be the cause of the OOMs? Are there specific settings I can review or actions I can take while I'm learning besides larger instances?
    m
    • 2
    • 14
  • r

    Rajat

    06/15/2025, 2:56 PM
    Hi team, I want to know a few things: what will happen if we increase the number of replicas from 2 to 3 of pinot with this values.yaml
    Copy code
    image:
     repository: <http://615177075440.dkr.ecr.ap-south-1.amazonaws.com/pinot|615177075440.dkr.ecr.ap-south-1.amazonaws.com/pinot>
     tag: 1.0.1
     pullPolicy: IfNotPresent
    
    cluster:
     name: pinot-prod
    
    # ----------------------------------------------------------------------------
    # ZOOKEEPER: 3 replicas
    # ----------------------------------------------------------------------------
    zookeeper:
     name: pinot-zookeeper
     replicaCount: 3
     persistence:
      enabled:   true
      storageClass: gp3   # ← GP3 EBS
      size:     10Gi
     resources:
      requests:
       cpu:  100m
       memory: 256Mi
      limits:
       cpu:  300m
       memory: 512Mi
    
     port: 2181
    
    # ----------------------------------------------------------------------------
    # CONTROLLER: 2 replicas, internal LB
    # ----------------------------------------------------------------------------
    controller:
     name: pinot-controller
     replicaCount: 2
     startCommand: "StartController"
     # podManagementPolicy: Parallel
     resources:
      requests:
       cpu:  100m
       memory: 1Gi
      limits:
       cpu:  300m
       memory: 3Gi
      
     jvmOpts: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent.jar=9010:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -XX:ActiveProcessorCount=2 -Xms512M -Xmx2G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xlog:gc*:file=/opt/pinot/gc-pinot-controller.log -Djute.maxbuffer=4000000"
       
     # Persist controller metadata
     persistence:
      enabled:   true
      accessMode:  ReadWriteOnce
      storageClass: gp3
      size:     50Gi
      mountPath:  /var/pinot/controller/data
    
     service:
      name: controller
      annotations:
       <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
       <http://prometheus.io/path|prometheus.io/path>: /metrics
       <http://prometheus.io/port|prometheus.io/port>: "9010"
      # labels:
      #  prometheus-monitor: "true"
      extraPorts:
       - name: controller-prom
        protocol: TCP
        containerPort: 9010
       
       
     podAnnotations:
      <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
      <http://prometheus.io/path|prometheus.io/path>: /metrics
      <http://prometheus.io/port|prometheus.io/port>: "9010"
    
     # Expose via Kubernetes Ingress on port 9000
     # ingress:
     #  v1:
     #   enabled: true
     #   ingressClassName: nginx     # or your ingress controller
     #   annotations:
     #    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: "HTTP"
     #    <http://nginx.ingress.kubernetes.io/rewrite-target|nginx.ingress.kubernetes.io/rewrite-target>: /
     #   hosts: [<http://pinot-eks.sr-bi-internal.in|pinot-eks.sr-bi-internal.in>]
     #   path: /controller
     #   tls: []
     external:
      enabled: false
    
    # ----------------------------------------------------------------------------
    # BROKER: 2 replicas, internal LB
    # ----------------------------------------------------------------------------
    broker:
     name: pinot-broker
     startCommand: "StartBroker"
     # podManagementPolicy: Parallel
     replicaCount: 2
    
     resources:
      requests:
       cpu:  200m
       memory: 1Gi
      limits:
       cpu:  500m
       memory: 3Gi
    
     jvmOpts: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent.jar=9020:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -XX:ActiveProcessorCount=2 -Xms512M -Xmx2G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xlog:gc*:file=/opt/pinot/gc-pinot-broker.log -Djute.maxbuffer=4000000"
    
     service:
      name: broker
      # type: LoadBalancer
      annotations:
       # <http://service.beta.kubernetes.io/aws-load-balancer-internal|service.beta.kubernetes.io/aws-load-balancer-internal>: "true"
       <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
       <http://prometheus.io/path|prometheus.io/path>: /metrics
       <http://prometheus.io/port|prometheus.io/port>: "9020"
      # labels:
      #  prometheus-monitor: "true"
      extraPorts:
       - name: broker-prom
        protocol: TCP
        containerPort: 9020
       
     podAnnotations:
      <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
      <http://prometheus.io/path|prometheus.io/path>: /metrics
      <http://prometheus.io/port|prometheus.io/port>: "9020"
    
     # Expose via Kubernetes Ingress on port 8099
     # ingress:
     #  v1:
     #   enabled: true
     #   ingressClassName: nginx     # or your ingress controller
     #   annotations:
     #    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: "HTTP"
     #    <http://nginx.ingress.kubernetes.io/use-regex|nginx.ingress.kubernetes.io/use-regex>: "true"
     #    <http://nginx.ingress.kubernetes.io/rewrite-target|nginx.ingress.kubernetes.io/rewrite-target>: /$2
     #   hosts: [<http://pinot-eks.sr-bi-internal.in|pinot-eks.sr-bi-internal.in>]
     #   path: /broker(/|$)(.*)
     #   pathType: ImplementationSpecific
     #   tls: []
     external:
      enabled: false
    
    
    # ----------------------------------------------------------------------------
    # PINOT SERVER: 2 replicas, each with 100 Gi gp3 PVC
    # ----------------------------------------------------------------------------
    server:
     name: pinot-server
     startCommand: "StartServer"
     # podManagementPolicy: Parallel
     replicaCount: 2
      
     resources:
      requests:
       cpu:  2000m  # 2 vCPU
       memory: 5Gi
      limits:
       cpu:  4000m  # 4 vCPU
       memory: 10Gi
    
     persistence:
      enabled: true
      accessMode: ReadWriteOnce
      size: 100G
      mountPath: /var/pinot/server/data
      storageClass: gp3
      
     jvmOpts: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent.jar=9030:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms4G -Xmx8G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xlog:gc*:file=/opt/pinot/gc-pinot-server.log -Djute.maxbuffer=4000000"
    
     service:
      name: server
      # type: LoadBalancer
      annotations:
       # <http://service.beta.kubernetes.io/aws-load-balancer-internal|service.beta.kubernetes.io/aws-load-balancer-internal>: "true"
       <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
       <http://prometheus.io/path|prometheus.io/path>: /metrics
       <http://prometheus.io/port|prometheus.io/port>: "9030"
      # labels:
      #  prometheus-monitor: "true"
      extraPorts:
       - name: server-prom
        protocol: TCP
        containerPort: 9030
       
     podAnnotations:
      <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
      <http://prometheus.io/path|prometheus.io/path>: /metrics
      <http://prometheus.io/port|prometheus.io/port>: "9030"
    
    
    # ----------------------------------------------------------------------------
    # MINION: 1 replica (for background tasks / retention, compaction, etc.)
    # ----------------------------------------------------------------------------
    minion:
     enabled: true      # run the minion pod
     name: pinot-minion
     startCommand: "StartMinion"
     # podManagementPolicy: Parallel
     replicaCount: 1     # scale up if you have heavy compaction/merge workloads
    
     resources:
      requests:
       cpu:  100m
       memory: 512Mi
      limits:
       cpu:  200m
       memory: 1Gi
    
     jvmOpts: "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -XX:ActiveProcessorCount=2 -Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xlog:gc*:file=/opt/pinot/gc-pinot-minion.log -Djute.maxbuffer=4000000"
    I am running pinot EKS with 3 nodes but utilizing only 2 as the number of replicas are two only..... I want to use 3 nodes for better performance what will happen if I increase the number of pods to 3 and how should I do it with the existing data not getting lost? @Xiang Fu @Mayank
  • m

    Mayank

    06/15/2025, 2:58 PM
    You need to tag the new instances and run rebalance
    r
    • 2
    • 5
  • m

    Mayank

    06/15/2025, 2:59 PM
    https://docs.pinot.apache.org/operators/operating-pinot/rebalance
  • g

    Georgi Varbanov

    06/16/2025, 11:29 AM
    Hello during query performance testing we found out that most of our queries spend a lot of time in steps that doesn't really make sense, can you shred some light into if it is possible to optimize those steps or it is normal behavior? Our queries were using the V1 engine (without multistage) and are simple sum, avg, min, max aggregations per customer_id (we have partitioning for it, so pinot scans relatively small amounts of data) and returning between 1 and 10 rows per query, where most of the queries also had a simple group by 1 column without ordering after that. If you need more info let me know
    m
    • 2
    • 41
  • r

    Rajat

    06/17/2025, 9:16 AM
    Hello, I want to know a few architectural things about pinot segments transition from realtime to offline, how does it work? and which config is used for that?
    m
    • 2
    • 3
  • p

    Pratik Bhadane

    06/17/2025, 12:42 PM
    Hello Team, We are currently using Apache Pinot on AWS EKS and are in the process of deploying a multi-tenant setup. As part of this, we’ve added 2 servers and 2 brokers, and tagged them appropriately to reflect a new tenant. We were able to successfully: 1. Create a Pinot table assigned to the new tenant 2. See all table segments in GOOD status 3. View the new tenant's brokers and servers correctly listed in the Pinot Web UI after tagging tenent. However, we’re encountering an issue while querying the table. The query fails with the following error: {"requestId":"33806233000000000","brokerId":"Broker_pinot-sr-broker-0.pinot-sr-broker.pinot.svc.cluster.local_8099","exceptions":[{"errorCode":410,"message":"BrokerResourceMissingError"}],"numServersQueried":0,"numServersResponded":0,"numSegmentsQueried":0,"numSegmentsProcessed":0,"numSegmentsMatched":0,"numConsumingSegmentsQueried":0,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"numDocsScanned":0,"numEntriesScannedInFilter":0...} In Controller UI we are getting below error message: Error Code: 450 InternalError: java.net.UnknownHostException: pinot-sr-broker-1.pinot-sr-broker.pinot.svc.cluster.local at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) at java.base/java.net.Socket.connect(Socket.java:609) at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:182) Attaching the deployment files used:
    deployment-broker-sr.yamldeployment-server-sr.yaml
    m
    • 2
    • 4
  • r

    Rajat

    06/19/2025, 8:05 AM
    Hi team, this size that is shown here is the size of each replica? or it is the actual size of data?
    n
    m
    • 3
    • 11
  • f

    francoisa

    06/23/2025, 9:19 AM
    Hi 😉 I’m still on an old pinot version 0.12 (migration is planned but I need a bit more robustness before) First things to look at is S3 as deepstore I’ve folowed the doc here https://docs.pinot.apache.org/release-0.12.0/users/tutorials/use-s3-as-deep-store-for-pinot From Swagger controller seems able to download the segment when I hit th download API but on server side lots of Failed to download segment absencesreport__1__5__20240527T2122Z from deep store: Download segment absencesreport__1__5__20240527T2122Z from deepstore uri s3://bucketName/segments/absencesreport_REALTIME/absencesreport__1__5__20240527T2122Z failed. Caught exception in state transition from OFFLINE -> ONLINE for resource: absencesreport_REALTIME, partition: absencesreport__1__5__20240527T2122Z Any ideas ?
    Copy code
    getting logs like software.amazon.awssdk.services.s3.model.S3Exception: The authorization header is malformed; the region is wrong; expecting 'eu-west-1'. (Service: S3, Status Code: 400, Request ID: 184BA1F70B628FA6, Extended Request ID: 82b9e6b1548ad0837abe6ff674d1d3e982a2038442a1059f595d95962627f827)
    here is my server conf for the S3 part
    Copy code
    # Pinot Server Data Directory
    pinot.server.instance.dataDir=/var/lib/pinot_data/server/index
    # Pinot Server Temporary Segment Tar Directory
    pinot.server.instance.segmentTarDir=/var/lib/pinot_data/server/segmentTar
    #S3
    pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
    pinot.server.storage.factory.s3.region=us-west-1
    pinot.server.segment.fetcher.protocols=file,http,s3
    pinot.server.storage.factory.s3.bucket.name=bucketName
    pinot.server.storage.factory.s3.endpoint=URL_OF_MY_S3_ENDOINT
    pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    pinot.server.segment.fetcher.s3.pathStyleAccess=true
    Any ideas welcome 🙂
    a
    • 2
    • 4
  • k

    Kiril Kalchev

    06/23/2025, 11:37 AM
    I have a highly aggregated real-time table that I’m using to query and chart statistics. Although I’ve added around 10 billion events, they’re aggregated(upserted) into about 500,000 rows. Despite this, the table currently takes up around 200 GB of storage. However, if I export the entire table using
    SELECT * FROM table
    and then re-import it using a simple tool, the size drops to just 15 MB. I only need the aggregated data — I don’t need per-event details. Is there a way to merge the old segments and significantly reduce table size and improve query speed using Pinot tasks?
    m
    a
    t
    • 4
    • 17
  • y

    Yeshwanth

    06/24/2025, 9:13 AM
    Hi Team, I'm on pinot 1.3 and trying out the multi topic ingestion in a single pinot table. I've configured my table as shown below
    Copy code
    "streamIngestionConfig": {
            "streamConfigMaps": [
              {
                "streamType": "kafka",
                "stream.kafka.topic.name": "flattened_spans2",
                "stream.kafka.broker.list": "kafka:9092",
                "stream.kafka.consumer.type": "lowlevel",
                "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
                "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
                "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
                "realtime.segment.flush.threshold.rows": "0",
                "realtime.segment.flush.threshold.time": "30m",
                "realtime.segment.flush.threshold.segment.size": "300M"
              },
              {
                "streamType": "kafka",
                "stream.kafka.topic.name": "flattened_spans3",
                "stream.kafka.broker.list": "kafka.pinot-0-nfr-setup.svc.cluster.local:9092",
                "stream.kafka.consumer.type": "lowlevel",
                "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
                "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
                "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
                "realtime.segment.flush.threshold.rows": "0",
                "realtime.segment.flush.threshold.time": "30m",
                "realtime.segment.flush.threshold.segment.size": "300M"
              }
            ]
          }
    But i am running into this issue.
    Copy code
    First kafka
    2025/06/20 13:21:17.528 INFO [KafkaConsumer] [otel_spans__1__0__20250620T1321Z] [Consumer clientId=otel_spans_REALTIME-flattened_spans2-1, groupId=null] Seeking to offset 0 for partition flattened_spans2-1
    
    Second kafka
    025/06/20 13:22:08.659 INFO [KafkaConsumer] [otel_spans__10001__0__20250620T1321Z] [Consumer clientId=otel_spans_REALTIME-flattened_spans3-1, groupId=null] Seeking to offset 0 for partition flattened_spans3-10001
    2025/06/20 13:22:08.659 INFO [KafkaConsumer] [otel_spans__10000__0__20250620T1321Z] [Consumer clientId=otel_spans_REALTIME-flattened_spans3-0, groupId=null] Seeking to offset 0 for partition flattened_spans3-10000
    the flattened_spans3 has only partitions 1-3 but the pinot server is seeking out partition number 10000 for some reason. Can someone please guide me on where i'm going wrong with my config ?
  • b

    baarath

    06/25/2025, 7:15 AM
    Hi Team Pinot server went down when checked it failed with following error in screenshot. Is it because of memory issue ? Will i loss data if i restart the server with following command ?
    Copy code
    bin/pinot-admin.sh StartServer -configFileName conf/pinot-server.conf
    x
    b
    y
    • 4
    • 8
  • a

    Aman Satya

    06/25/2025, 8:54 AM
    Hi team, I'm trying to run a
    MergeRollupTask
    on the
    sales_OFFLINE
    table, but it fails with a
    StringIndexOutOfBoundsException
    . It looks like the error comes from this line:
    MergeRollupTaskUtils.getLevelToConfigMap()
    Here, is the config that I am using.
    Copy code
    json
    
    j
    "taskTypeConfigsMap": {
      "MergeRollupTask": {
        "mergeType": "rollup",
        "bucketTimePeriod": "1d",
        "bufferTimePeriod": "3d",
        "revenue.aggregationType": "sum",
        "quantity.aggregationType": "sum"
      }
    }
    And here's the relevant part of the error:
    Copy code
    java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 9
    at ...MergeRollupTaskUtils.getLevelToConfigMap(MergeRollupTaskUtils.java:64)
    b
    m
    • 3
    • 5
  • m

    mathew

    06/26/2025, 8:15 AM
    Hi Team, DOES PINOT SUPPORTS ADLS Gen 2 (wasbs) or it only supports abfss I am writing parquet files to the azure container, using wasbs method. Then i use this ingestionconfig to ingest it to pinot, using minions: "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY", "consistentDataPush": False, "batchConfigMaps": [ { "input.fs.className": "org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS", "input.fs.prop.authenticationType": "ACCESS_KEY", "input.fs.prop.accountName": "wzanalyticsdatastoreprod", "input.fs.prop.accessKey": "xxxxxxxxxxx", "input.fs.prop.fileSystemName": tenant_id, "inputDirURI": f"wasbs://{tenant_id}@wzanalyticsdatastoreprod.blob.core.windows.net/pinot", "includeFileNamePattern": "glob:**/*.parquet", "excludeFileNamePattern": "glob:**/*.tmp", "inputFormat": "parquet" } ] } But I think pinot is not able to look into specified BLOB. I cant use abfss in this container cause does not support BlobStorageEvents or SoftDelete In my DEV container, i was writing the parquet in abfss methos, it is still working.. Is something wrong in my ingestionConfig, using wasbs. can someone pls help!!
    m
    • 2
    • 5
  • j

    Jan

    06/26/2025, 10:34 AM
    Hi team, I'm trying to download segments from one Pinot table and use them in a different table that has the same schema but a different retention configuration. Currently, I'm encountering an issue where the metadata doesn't match because the tables have different names.
    m
    • 2
    • 4
  • i

    Isaac Ñuflo

    06/30/2025, 2:58 PM
    Hi team, first time here. I have an issue when trying to update a table via API. The update is not being applied.
    g
    m
    l
    • 4
    • 13
  • l

    Luis Pessoa

    06/30/2025, 7:11 PM
    hi guys.. has anyone faced this recurring message on your logs? We are seeing this for some time in our pre prod envs despite following the configuration settings as described in the documentation
    Copy code
    The configuration 'stream.kafka.isolation.level' was supplied but isn't a known config.
    h
    • 2
    • 1
  • i

    Idlan Amran

    07/01/2025, 8:46 AM
    https://docs.pinot.apache.org/manage-data/data-import/batch-ingestion/dim-table im looking into Dimensions table currently and noticed that it requires
    primaryKeyColumns
    , based on my experience on upsert table, primary key metadata TTL was stored on heap and as number of records and primary keys grows, memory/RAM usage will increase too. do i need to expect this kind of situation too on dimension table? and does dimension table can be a realtime table rather than offline so i can push the data through kafka ? our app architecture is kinda complex right now. we need a table to stores product activity logs, kind of product tracking such as example
    stock increase
    ,
    price increment
    and etc. and in some cases there are ingestion that was duplicated like in the same day it will be pushed more than 1 time to kafka, causing duplicate. by right we do not need full product data, we just need the changes like the example i shared and what are the id of the changes so we can check historically for this particular product what was changed. i tested using upsert since its the most near to my use case but the memory usage was very huge and our pinot ec2 server was going downtime from time to time bcs of upsert table due to out of memory error. i really appreciate if any of you guys can share any config that you guys work on / whatever i can do to tune my config / improve our ingestion to pinot
    b
    • 2
    • 2
  • v

    Vipin Rohilla

    07/02/2025, 8:13 AM
    Hi all, I have ran into an issue with pinot minion with kerberized hdfs where pinot minion upsert tasks fails with below error:
    Copy code
    UI:
    org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 3 attempts
    	at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:65)
    	at org.apache.pinot.common.utils.fetcher.BaseSegmentFetcher.fetchSegmentToLocal(BaseSegmentFetcher.java:74)
    	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:124)
    	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:132)
    	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocal(SegmentFetcherFactory.java:165)
    	at org.apache.pinot.plugin.minion.tasks.BaseTaskExecutor.downloadSegmentToLocal(BaseTaskExecutor.java:121)
    	at org.apache.pinot.plugin.minion.tasks.BaseSingleSegmentConversionExecutor.executeTask(BaseSingleSegmentConversionExecutor.java:105)
    	at org.apache.pinot.plugin.minion.tasks.BaseSingleSegmentConversionExecutor.executeTask(BaseSingleSegmentConv
    
    Minion log shows:
    2025/07/02 13:23:21.309 WARN [PinotFSSegmentFetcher] [TaskStateModelFactory-task_thread-3] Caught exception while fetching segment from: <hdfs://xxxxxxx/controller_data/xxxxxxx/xxxxxxx__4__648__20250528T1621Z> to: /tmp/PinotMinion/data/UpsertCompactionTask/tmp-9727a6d3-cc2d-44d0-9666-34939abbc356/tarredSegment
    org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
            at jdk.internal.reflect.GeneratedConstructorAccessor43.newInstance(Unknown Source) ~[?:?]
            at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
            at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[?:?]
            at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) ~[pinot-parquet-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
            at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88) ~[pinot-parquet-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
            at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1741) ~[pinot-orc-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
            at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1829) ~[pinot-orc-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
            at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1826) ~[pinot-orc-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
            at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[pinot-parquet-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
            at org.apache.hadoop.hdfs.DistributedFil
    Pinot Server, Broker, and Controller are all able to read/write segments from HDFS using the configured keytab. I have recently added 3 Pinot Minion on new nodes, configured with the same keytab, principal, and Hadoop config path. However, when the Minion runs tasks like UpsertCompaction, it fails with the above error: Minion runs under pinot user (systemd) kinit is successful, and Kerberos ticket is visible via klist
    pinot.minion.segment.fetcher.hdfs.hadoop.kerberos.principal=xxxxxx@xxxxxx
    pinot.minion.segment.fetcher.hdfs.hadoop.kerberos.keytab=/etc/security/keytabs/pinot.keytab
    pinot.minion.storage.factory.hdfs.hadoop.conf.path=/usr/hdp/xxxxx/hadoop/conf
    Is there anything else Pinot Minion needs to perform Kerberos login internally? Does it require JAAS config explicitly even with keytab/principal settings?