https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • u

    Utsav Jain

    10/29/2025, 5:15 AM
    Hi Team we are using realtime tables with upserts enabled and the ttl window is 12 hrs To purge any stale record coming after ttl window and to avoid runnig queries with distinct we enabled segment compaction job But we are seeing issues while running it with full accuracy like few of the older segments are never considered for compaction which is resulting in inaccurate numbers while running the query Can you please help in understanding the causes of such cases ?
    x
    • 2
    • 8
  • r

    Rajat

    10/29/2025, 10:16 AM
    Hi Team, is there any known bug in Pinot? like when I check for duplicates in data by running:
    Copy code
    SELECT s_id, count(*)
    FROM shipmentMerged_final
    GROUP BY s_id
    HAVING COUNT(*) > 1
    Sometimes it shows no records but sometimes it shows data with count as 2
  • r

    Rajat

    10/29/2025, 10:49 AM
    another issue:
    Copy code
    SELECT COUNT(*) AS aggregate,
    s_id
    FROM shipmentMerged_final
    WHERE o_company_id = 2449226
      AND o_created_at BETWEEN TIMESTAMP '2025-10-10 00:00:00' AND TIMESTAMP '2025-10-26 23:59:59'
      AND o_shipping_method IN ('SR', 'SRE', 'AC')
      AND o_is_return = 0
      AND o_state = 0
    group by 2
    limit 1500
    Above Query is showing: 1150 total records But When running:
    Copy code
    SELECT COUNT(*) AS aggregate
    FROM shipmentMerged_final
    WHERE o_company_id = 2449226
      AND o_created_at BETWEEN TIMESTAMP '2025-10-10 00:00:00' AND TIMESTAMP '2025-10-26 23:59:59'
      AND o_shipping_method IN ('SR', 'SRE', 'AC')
      AND o_is_return = 0
      AND o_state = 0
    The count is coming as: 1162
    y
    • 2
    • 1
  • r

    Rajat

    10/29/2025, 10:49 AM
    @Xiang Fu @Mayank
  • r

    Rashpal Singh

    10/29/2025, 11:24 PM
    Hi All, I am using Pinot 1.1 and I want to store null for my DOUBLE column.. For that I have used below confs:
    nullHandlingEnabled=true at table config level
    enableColumnBasedNullHandling": true at schema level
    Copy code
    {
          "name": "notNullColumn",
          "dataType": "DOUBLE",
          "notNull": False
     }
    Still when I am querying, I am getting "0" instead of null. How can I fix this issue where I want to see null (original value) instead of 0 in query response without adding "SET enableNullHandling=true" in my query
    x
    • 2
    • 19
  • r

    Rahul Sharma

    10/30/2025, 4:23 AM
    Hi team, I am creating an autoscaler for minion-based batch ingestions. To scale up and down, I need the number of tasks that are waiting and the number of tasks that are running. I checked the Pinot metrics and found these two:
    pinot_controller_numMinionSubtasksWaiting_Value
    and
    pinot_controller_numMinionSubtasksRunning_Value
    . However, for each task type, they always show a value of 0 even when tasks are running. Am I using the wrong metrics? Which metrics should I use to build a custom autoscaler for minions?
    x
    s
    • 3
    • 27
  • f

    francoisa

    10/30/2025, 8:49 AM
    Hi team 😉 Quick question about some messages I see in my monitoring : “Recreating stream consumer for topic partition *, reason: Total idle time: 183647 ms exceeded idle timeout: 180000 ms” What is the behaviour behind that ? Reset the consumer to last commited offset and reingest things ? Or just re-pop the consumer to his last consumed offset ?
    j
    • 2
    • 2
  • b

    Badhusha Muhammed

    10/30/2025, 4:17 PM
    Hello Team, We are encountering an issue where our Pinot servers are timing out when attempting to establish a session with Zookeeper. This is causing the Pinot Servers to crash (or go down). Although the server attempts to iteratively establish a new connection, the process continues to time out until we manually restart the server instance. A similar scenario can be found in the following GitHub issue: https://github.com/apache/pinot/issues/4686. 1. The initial issue between the Pinot Server and Zookeeper was session expiration. 2. Regardless of the underlying issue (e.g., Zookeeper latency, GC pauses blocking the main thread), Pinot should be capable of automatically re-establishing the connection once the problem is resolved. Instead, we are forced to manually restart the server to restore a healthy Zookeeper session. As a result , the server is being removed from the LIVE_INSTANCE metadata and registered as DEAD.
    x
    j
    • 3
    • 22
  • v

    Victor Bivolaru

    10/31/2025, 1:31 PM
    Hello, I have a question about how the controller handles rebalancing segments. I am more interested in the following aspect: is there any downtime while moving a segment from a server to another ? I see that in the manual rebalance job you can specify downtime=false only if you have replication. Is the mechanism behind controller rebalancing the same ?
    y
    • 2
    • 6
  • m

    Mannoj

    11/03/2025, 4:39 PM
    Hi Team, I was checking if pinot really logs auditing as in
    who did what, how, from which source and at what time
    ? Seems like the code base logs only the response and the type and not the request. It will be great if request is also being logged, so audit info is fully available. In code base : ControllerResponseFilter.java > LOGGER.info("Handled request from {} {} {}, content-type {} status code {} {}", srcIpAddr, method, uri, contentType, > respStatus, reasonPhrase); If this has requestContext is also added , I believe it should add request details with payload that is initially sent by the user, or if its disabled on purpose, do you mind giving that control to log4j that enduser can choose to enable it or not. I'm no developer 🥺, I'm trying the make sense of the code and see if it can be added . Where I'm coming from is: I just added a user via controller to have read,write permissions of a particular user on all tables. All I get is below.
    Copy code
    2025/11/03 20:30:59.922 INFO [ControllerResponseFilter] [grizzly-http-server-15] Handled request from 192.168.13.1 PUT <http://test-phaseroundtoaudit11.ori.com:9000/users/dedactid_rw?component=BROKER&passwordChanged=false|http://test-phaseroundtoaudit11.ori.com:9000/users/dedactid_rw?component=BROKER&passwordChanged=false>, content-type text/plain;charset=UTF-8 status code 200 OK
    2025/11/03 20:30:59.957 INFO [ControllerResponseFilter] [grizzly-http-server-14] Handled request from 192.168.13.1 GET <http://test-phaseroundtoaudit11.ori.com:9000/tables|http://test-phaseroundtoaudit11.ori.com:9000/tables>, content-type null status code 200 OK
    2025/11/03 20:30:59.980 INFO [ControllerResponseFilter] [grizzly-http-server-12] Handled request from 192.168.13.1 GET <http://test-phaseroundtoaudit11.ori.com:9000/users|http://test-phaseroundtoaudit11.ori.com:9000/users>, content-type null status code 200 OK
    But its missing read,write has been given my admin user to ALL/particular tables. There is further granularity missing which is crucial I believe. Let me know your views. Thanks!!
    y
    s
    • 3
    • 6
  • a

    Alexander Maniates

    11/03/2025, 7:10 PM
    QQ: Is there a certain task we can run to force a server to re-upload it's segment to the deep store (in our case S3)? We have a situation where a realtime server failed to upload to S3, and then the segment was offloaded to offline servers, The offline servers were able to fetch the segment from their online peers to successfully load the segment, but the segment is still in a weird state where it is missing from the deep-store/S3. Should some periodic task be running to check on this, or can we run some manual controller task to "heal" the situation?
    m
    • 2
    • 4
  • r

    Rahul Sharma

    11/04/2025, 10:02 AM
    Hi Team, Context: We want to use Apache Pinot for real-time analytics query use cases in our microservices. Since realtime Pinot tables ingest directly from Kafka, ingestion delays/lag can occur. Our requirement is: whenever a document (row) in Pinot is updated, we want to push an event to Kafka with the primary key that changed. This would allow downstream microservices to consume that event, know that a specific record has been updated in Pinot, then trigger real-time analytics queries and perform required downstream actions. Question: Is there any existing feature or recommended workaround in Pinot to detect when a row is updated in a realtime table and trigger an event (e.g., send a Kafka message) so downstream services can be notified?
    m
    s
    • 3
    • 14
  • m

    Mariusz

    11/04/2025, 2:42 PM
    Hi Team, Recently I was trying to enable OOM (https://docs.pinot.apache.org/operators/operating-pinot/oom-protection-using-automatic-query-killing). I have added below configurations in both broker and server config files.
    Copy code
    pinot.broker.instance.enableThreadCpuTimeMeasurement=true
    pinot.broker.instance.enableThreadAllocatedBytesMeasurement=true
    pinot.server.instance.enableThreadAllocatedBytesMeasurement=true
    pinot.server.instance.enableThreadCpuTimeMeasurement=true
    pinot.query.scheduler.accounting.enable.thread.memory.sampling=true
    pinot.query.scheduler.accounting.enable.thread.cpu.sampling=true
    
    
    pinot.query.scheduler.accounting.oom.enable.killing.query=true
    pinot.query.scheduler.accounting.query.killed.metric.enabled=true
    
    pinot.query.scheduler.accounting.oom.critical.heap.usage.ratio=0.3
    pinot.query.scheduler.accounting.oom.panic.heap.usage.ratio=0.3
    <http://pinot.query.scheduler.accounting.sleep.ms|pinot.query.scheduler.accounting.sleep.ms>=30
    pinot.query.scheduler.accounting.oom.alarming.usage.ratio=0.3
    pinot.query.scheduler.accounting.sleep.time.denominator=3
    pinot.query.scheduler.accounting.min.memory.footprint.to.kill.ratio=0.01
    
    pinot.query.scheduler.accounting.factory.name=org.apache.pinot.core.accounting.PerQueryCPUMemAccountantFactory
    pinot.query.scheduler.accounting.cpu.time.based.killing.enabled=true
    pinot.query.scheduler.accounting.publishing.jvm.heap.usage=true
    <http://pinot.query.scheduler.accounting.cpu.time.based.killing.threshold.ms|pinot.query.scheduler.accounting.cpu.time.based.killing.threshold.ms>=1000
    I have run some heavy queries to test the OOM killing feature, but I don't see any killed queries in the broker/server metrics.
    Copy code
    SELECT accountId,countryCode,direction,day,hour,msgType,currency,topic,finalStatus,year,month,
      SUM(CASE WHEN finalStatus = 'Failed' THEN 1 ELSE 0 END) AS failed_count,
      SUM(CASE WHEN finalStatus = 'Delivered' THEN 1 ELSE 0 END) AS success_count,
      COUNT(*) AS total_records,
      COUNT(DISTINCT udrId) AS unique_udrs,
      SUM(price) AS total_revenue,
      AVG(price) AS avg_price,
      MAX(price) AS max_price,
      MIN(price) AS min_price,
      SUM(CASE WHEN errorCode > 0 THEN 1 ELSE 0 END) AS error_count,
      SUM(price * (CASE WHEN direction = 'Unknown' THEN 1 ELSE -1 END)) AS net_revenue
    FROM
      dummy_table
    GROUP BY
      accountId,countryCode,direction,msgType,currency,topic,finalStatus,year,month,day,hour
    ORDER BY
      total_revenue DESC,
      avg_price DESC
    LIMIT 1000000
    Whenever I run this query, the server goes down, but no queries are terminated automatically. Can you please help me to understand if I am missing any configurations or steps to enable this feature? I did test on
    apachepinot/pinot:1.5.0-SNAPSHOT-9d32f376d8-20251016, size of Heap -Xms2G -Xmx2G for server and broker.
  • n

    Naveen

    11/05/2025, 9:20 AM
    Hi Team, Im getting this error continously even my servers are running properly and status of the tables are in good state. help me to resolve the issue kubectl get pod -n dp-1-346 NAME READY STATUS RESTARTS AGE pinot-broker-0 1/1 Running 0 13h pinot-controller-0 1/1 Running 0 6m26s pinot-minion-stateless-84fc6899f9-2shqp 1/1 Running 0 13h pinot-server-0 1/1 Running 0 6m32s pinot-server-1 1/1 Running 0 6m39s presto-coordinator-0 1/1 Running 0 25h presto-worker-0 1/1 Running 0 25h zookeeper-0 1/1 Running 0 13h
    a
    p
    r
    • 4
    • 24
  • r

    Rajasekharan A P

    11/06/2025, 7:04 AM
    Hi, In my Pinot cluster, I initially had 4 servers (A, B, C, D) with segments distributed across them. I wanted to consolidate all segments onto a single server, so I removed the tags from servers B, C, and D, and then ran a rebalance operation to allocate all segments to the remaining server (A). After rebalancing, all segments were assigned to the single server. However, the segments that were originally on the other servers appeared in ERROR state in the external view, even though their ideal state in ZooKeeper showed them as ONLINE. For example: • Ideal State:
    Copy code
    "load_chat_messages_core_1756318894786_1758914214102_1758919671601": {
        "Server_172.18.0.6_8098": "ONLINE"
    }
    • External View:
    Copy code
    "load_chat_messages_core_1756318894786_1758914214102_1758919671601": {
        "Server_172.18.0.6_8098": "ERROR"
    }
    To resolve this, I performed a reload and reset operation on the affected segments. After the reset, the segment state transitioned from ERROR to OFFLINE, allowing it to be properly reloaded. Setup details: • Running Pinot in Docker • Using local storage for segment files • Segment data is volume-mounted
  • f

    francoisa

    11/06/2025, 10:41 AM
    Hi questions on strange load accross VMs. Does anyone have faced this kind of issues before ? We’ve got 4 servers 2 of them shows lot of CPU/ RAM usages where the two others looks like normal/chill. Data is properly balanced (32 tables / 2 partitions (non spooled) with replication of 2 ) so each server is used correctly when queried. Can be seen on the network traffic close to equal on each server. Tried to rebalance tables but all already balanced. I’ve rework the JMX to grab only relevant data and I do not see anything. Same queries rate and segment processed / server. Any clues ?
    m
    • 2
    • 16
  • v

    Victor Bivolaru

    11/07/2025, 1:09 PM
    I am trying to debug a strange issue regarding segment generation from a realtime table. Its config is set up like this:
    Copy code
    "realtime.segment.flush.threshold.rows": "0",
    "realtime.segment.flush.threshold.segment.size": "500M",
    "realtime.segment.flush.threshold.time": "4h"
    However, when inspecting the metadata of any of the realtime segments we can see for example:
    Copy code
    "segment.realtime.endOffset": "67399447",
    "segment.start.time": "1762424217000",
    "segment.time.unit": "MILLISECONDS",
    "segment.flush.threshold.size": "100000",
    "segment.realtime.startOffset": "66512835",
    "segment.size.in.bytes": "14018213",  <====== 14MB instead of 500M    
    "segment.end.time": "1762426143000",  <====== subtracting segment.start.time from this we get roughly 35 min 
    "segment.total.docs": "100000",
    "segment.realtime.numReplicas": "1",
    "segment.creation.time": "1762511599197",
    "segment.index.version": "v3",
    "segment.crc": "3704033136",
    "segment.realtime.status": "DONE",
    j
    • 2
    • 2
  • r

    Rajasekharan A P

    11/10/2025, 4:44 AM
    Hello Everyone, I am facing some issues in production with the Pinot setup. Could anyone help me?🙂
  • r

    Rajasekharan A P

    11/11/2025, 12:32 PM
    Where should I add the following configuration for enabling or disabling multi-tenant instance isolation in Apache Pinot?
    Copy code
    cluster.tenant.isolation.enable=false
    Should it go inside
    controller.conf
    or
    pinot-controller.conf
    ? I’m running Pinot in Docker, and for the controller service, my command looks like this:
    Copy code
    command: "StartController -zkAddress pinot-zookeeper:2181 -configFileName /opt/pinot/conf/pinot-controller.conf"
    I added the configuration in
    pinot-controller.conf
    , but the controller container is failing to start.
    s
    • 2
    • 2
  • m

    mathew

    11/12/2025, 4:55 AM
    Can u pls help me resolve a doubt? in azure, does pinot support the (wasbs) format? when i try to ingest the files using wasbs, pinot is not picking it up.. In our dev setup, I'm using abfss.. it's working fine, pinot is picking up the files.
    m
    • 2
    • 2
  • s

    Satya Mahesh

    11/12/2025, 10:34 AM
    Hi team, pls help and tell the solution guys. this is the blocker to my work and let me know production level suggestions. I added the upsert configuration to the existing setup, but it didn’t work initially. After deleting the old configuration and re-adding the same setup with upserts, it started working. However, after some time, the segments began failing. { "REALTIME": { "tableName": "views_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "views", "replication": "1", "retentionTimeUnit": "DAYS", "retentionTimeValue": "90", "replicasPerPartition": "1", "timeColumnName": "view_end", "minimizeDataMovement": false }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "tableIndexConfig": { "aggregateMetrics": false, "starTreeIndexConfigs": [], "enableDefaultStarTree": false, "nullHandlingEnabled": false, "noDictionaryColumns": [ "events" ], "invertedIndexColumns": [ "workspace_id", "country_code", "fp_playback_id", "browser_name", "is_final" ], "bloomFilterColumns": [], "onHeapDictionaryColumns": [], "rangeIndexColumns": [ "view_end", "view_start", "created_at" ], "sortedColumn": [ "view_end", "quality_of_experience_score", "playback_score", "render_quality_score", "stability_score", "startup_score" ], "varLengthDictionaryColumns": [], "rangeIndexVersion": 2, "optimizeDictionaryForMetrics": false, "optimizeDictionary": false, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "loadMode": "MMAP", "enableDynamicStarTreeCreation": true, "columnMajorSegmentBuilderEnabled": true, "noDictionarySizeRatioThreshold": 0.85 }, "metadata": {}, "quota": {}, "task": { "taskTypeConfigsMap": { "UpsertCompactionTask": { "schedule": "0 0 * ? * *", "bufferTimePeriod": "1h", "invalidRecordsThresholdPercent": "30", "invalidRecordsThresholdCount": "100000", "tableMaxNumTasks": "10", "validDocIdsType": "SNAPSHOT" } } }, "routing": { "segmentPrunerTypes": [ "partition" ], "instanceSelectorType": "strictReplicaGroup" }, "query": {}, "upsertConfig": { "enableSnapshot": true, "deletedKeysTTL": 0, "mode": "FULL", "comparisonColumns": [ "view_end" ], "metadataTTL": 0, "dropOutOfOrderRecord": false, "hashFunction": "NONE", "defaultPartialUpsertStrategy": "OVERWRITE", "enablePreload": true, "consistencyMode": "NONE", "upsertViewRefreshIntervalMs": 3000, "allowPartialUpsertConsumptionDuringCommit": false }, "ingestionConfig": { "transformConfigs": [ { "columnName": "created_at", "transformFunction": "Now()" } ], "streamIngestionConfig": { "streamConfigMaps": [ { "streamType": "kafka", "stream.kafka.topic.name": "fp-data-processed-views-v1", "stream.kafka.consumer.prop.group.id": "pinot-views", "stream.kafka.broker.list": "kafka-cluster-broker-0.kafka-cluster-kafka-brokers.prod-kafka.svc.cluster.local9092,kafka cluster broker 1.kafka cluster kafka brokers.prod kafka.svc.cluster.local9092,kafka-cluster-broker-2.kafka-cluster-kafka-brokers.prod-kafka.svc.cluster.local:9092", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "largest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "sasl.mechanism": "SCRAM-SHA-512", "security.protocol": "SASL_PLAINTEXT", "sasl.jaas.config": "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"hfjcnfrjnrc\" password=\"njffhirjfkriviruuir\";", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.segment.size": "200M", "realtime.segment.flush.threshold.time": "24h" } ], "columnMajorSegmentBuilderEnabled": true, "trackFilteredMessageOffsets": false }, "continueOnError": false, "rowTimeValueCheck": false, "segmentTimeValueCheck": true }, "isDimTable": false } }
    m
    z
    • 3
    • 15
  • r

    Rashpal Singh

    11/12/2025, 5:26 PM
    Hi Team, I am checking out release 1.4 code base and building it. https://github.com/apache/pinot/releases/tag/release-1.4.0 I have a doubt regarding below PR: https://github.com/apache/pinot/pull/16624 Why above PR is not included in 1.4 code base?
    m
    y
    • 3
    • 3
  • s

    Srinivasan Duraiswamy

    11/13/2025, 2:33 AM
    Hi Team, we are seeing a major performance issue on a spring boot application which queries pinot through pinot jdbc connectivity. The api RPS is expected to ~150 with less than 1 sec response time. Do we need to implement Hikari CP to mitigate the performance issues ? From the pinot brokers we see that response times are less than 1 sec. Can you please help ? CC: @Mayank
    m
    • 2
    • 7
  • r

    Rajat

    11/13/2025, 6:10 AM
    Hi team, what is the steps for adding index in running hybrid table??
  • m

    Milind Chaudhary

    11/13/2025, 6:27 AM
    Hi Team Need help in debugging this issue. The cpu for all servers in a tenant keeps spiking. There is no queries running, just kafka ingestion. Tried pausing ingestion for the main table then the cpu went down, But after resuming it spiked back again. Things I have tried. • Restarting servers • Reloading segments • Pausing ingestion None of my attempts worked for stabilising the cpu load. Can someone please help here?
    x
    • 2
    • 25
  • a

    Aashiq PS

    11/13/2025, 8:02 AM
    hi, i have enabled pinot auth in helm values, for controller and broker, but the server is showing error 401 unauthorized. can anybody guide the solution.
    Copy code
    pinotAuth:
      enabled: true
      controllerFactoryClass: org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory
      brokerFactoryClass: org.apache.pinot.broker.broker.BasicAuthAccessControlFactory
      configs:
        - access.control.principals=admin
        - access.control.principals.admin.password=<password>
    error
    Copy code
    org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 401 (Unauthorized) with reason: "HTTP 401 Unauthorized" while sending request: /segmentConsumed?reason=forceCommitMessageReceived&streamPartitionMsgOffset=214683&instance=Server_prod-pinot-server-0.prod-pinot-server-headless.prod-pinot.svc.cluster.local_8098&name=views__0__0__20251113T0646Z&rowCount=21&memoryUsedBytes=84567768 to controller: prod-pinot-controller-2.prod-pinot-controller-headless.prod-pinot.svc.cluster.local, version: Unknown
    x
    • 2
    • 18
  • v

    Veerendra

    11/13/2025, 9:03 AM
    Hi Team, We initially had 3 controller nodes and recently added 3 more. Following this, we stopped the controller service on the 3 old nodes. However, after about a week, the ingestion process started failing because the Pinot server was attempting to connect to one of the stopped controller nodes. Interestingly, we did not encounter this issue in another cluster where the same approach was implemented. While I understand that we need to remove the stopped instances from Zookeeper using the Swagger API, ideally, the Pinot server should fail over and connect to an active/leader controller automatically. Am I missing something ?
    Copy code
    2025-11-12 16:11:54.373 ERROR [sample__121__287__20251105T1324Z] LLRealtimeSegmentDataManager_sample__66__339__20251105T1654Z - Could not send request <http://pinot-controller-01.local:9000/segmentUpload?segmentSizeBytes=1073117101&buildTimeMillis=91591&streamPartitionMsgOffset=10967889843&instance=pinot-server-01.local_8098&offset=-1&name=sample__121__287__20251105T1324Z&rowCount=73431535&memoryUsedBytes=2015467107>
    org.apache.pinot.shaded.org.apache.http.conn.HttpHostConnectException: Connect to pinot-controller-01.local:9000 [pinot-controller-01.local/10.10.10.56] failed: Connection refused (Connection refused)
      at org.apache.pinot.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:156) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
      at org.apache.pinot.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    x
    • 2
    • 8
  • v

    Victor Bivolaru

    11/14/2025, 10:20 AM
    Hi team, I have a question regarding the realtime to offline minion tasks. Our data must keep a certain column sorted throughout segments. In our current flow we do that by waiting for a consuming segment to reach a large enough size (hundreds of MB) and then seal it and let the minion task run nightly and move data to offline. Re-reading the documentation it seems that the
    RealtimeToOfflineSegmentsTask
    (as opposed to
    MergeRollupTask
    where this is not mentioned) keeps the data sorted in the segment it builds. If this were the case I would change the configs in order to seal smaller segments more frequently and let the minion bunch all these small segments up and create the larger, sorted segment to the offline table. My question is if there is any way of validating that the data inside of the offline segments does indeed keep the data sorted by the column
    m
    • 2
    • 5
  • s

    suraj sheshadri

    11/20/2025, 2:19 AM
    Hello, do we know when we might have consistent data load for spark execution. https://github.com/apache/pinot/issues/12941#event-12498082012 Currently we a table that is pretty huge and sometimes it takes time for all segments to be refreshed and if a user queries the table during this time we might show incorrect results if only some segments are refreshed. cc: @Jackie
    j
    • 2
    • 1
  • n

    Neeraja Sridharan

    11/20/2025, 4:49 AM
    ❓Looks like partition-based segment pruning in Pinot can be configured for multiple columns for
    offline
    tables. Appreciate any help in confirming if the same applies to
    real-time
    tables as well 🙇‍♀️ Here is the associated reference for Kafka stream, but it doesn't explicitly mention if partition-based segment pruning can be set up for multiple columns in the corresponding Pinot table config: https://docs.pinot.apache.org/basics/getting-started/frequent-questions/ingestion-faq#how-do-i-enable-partitioning-in-pinot-when-using-kafka-stream I guess the pre-requisite is: Input Kafka stream needs to be configured with a custom partitioner to match the partition column(s), partition function and number of partitions set up in the Pinot table config.
    m
    • 2
    • 4