Apache Pinot #general

10/10/2025, 8:13 AM

Hi everyone, Any plans to move away from Bitnami zookeeper as a dependency in Pinot Helm chart?

Shubham Kumar

10/10/2025, 9:59 AM

Hi team, My current primary key count is around 100 million. Whenever I restart the server, the primary key count increases to around 260 million and then drops back to 100 million. Could you please help me understand why this behavior occurs?

Arnav

10/13/2025, 9:04 AM

Hi team, TOTAL_KEYS_MARKED_FOR_DELETION is a meter metric, means it will reset when server restarts and also its _Count gives cumulative value. So is there any way to get the exact keys marked for deletion for a time frame like last 2hrs or so?

RANJITH KUMAR

10/13/2025, 3:26 PM

Hi Team, What is API that we can use to get all tasks running associated to OFFLINE Table configured with minion job tasks. I am able to get the task list and able to get config details for task name but how can we do it with OFFLINE TABLE Name how to get task name list ? Also even after deleting the OFFLINE Table from UI pinot controller task segments are running in background and also not even able to stop and delete them , facing these errors Method Not Allowed and Server error '500 Internal Server Error' for url 'http://pinot-controller:9000/tasks/task/Task_SegmentGenerationAndPushTask_199db571-e077-4d74-86e6-f380de37ea51_1760096118944' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500 respectively.

RANJITH KUMAR

10/14/2025, 1:58 PM

Hi Team, Need general recommendation how to make sure minion tasks gets completed quickly ideally within 15 mins lets assumes we have 10x GBs of data for 20+ tables. What is hardware we need to scale to make minion tasks complete fastly?

Paulc

10/21/2025, 8:57 AM

Shubham Kumar

10/22/2025, 4:19 AM

Hi Team, Could you please share a sample logical table configuration? I’m currently using the configuration below for my logical table, but it only fetches data from the realtime table the offline table data is not being fetched.

Copy code

{
  "tableName": "logicalTable",
  "physicalTableConfigMap": {
    "user_stream_REALTIME": {},
    "user_batch_OFFLINE": {}
  },
  "refOfflineTableName": "user_batch_OFFLINE",
  "refRealtimeTableName": "user_stream_REALTIME",
  "brokerTenant": "DefaultTenant",
  "timeBoundaryConfig": {
    "boundaryStrategy": "min",
    "parameters": {
      "function": "min"
    }
  }
}

Xiang Fu

10/23/2025, 12:13 AM

The next Pinot contributors call will happen tomorrow 8:30AM PDT.

🚀 2

Arnav

10/28/2025, 9:46 AM

Hi team, Anyone can please explain me, Query 1 taking around 120 secs whereas Query 2 taking 15-20 secs to give same result. Total Docs in table 6Billion and my table is RT table Is it because in Query2, all 3 queries are computed parallely? or in first query segments are loaded to memory and then other two query takes very less time hence overall less query time Query 1:

Copy code

SELECT * FROM table
  WHERE customer_id = 1234
    AND msisdn IN ( ..1000 msisdns)

Query 2:

Copy code

SELECT * FROM table
  WHERE customer_id = 1234
    AND msisdn IN ( ..350 msisdns)
  UNION ALL
  SELECT * FROM append_iot_session_events
  WHERE customer_id = 1234
    AND msisdn IN (..350 msisdns)
  UNION ALL
  SELECT * FROM append_iot_session_events
  WHERE customer_id = 1234
    AND msisdn IN (..300 msisdns)

robert zych

10/29/2025, 2:44 PM

The next Pinot Contributor call is scheduled for next Tuesday 8:30AM pacific https://www.meetup.com/apache-pinot/events/311759314/?slug=apache-pinot&eventId=311759314 Slack Conversation

Matt Nawara

10/30/2025, 12:26 PM

Hi all, we have a usecase where • we have a table with a metric that is sourced from an ingestion aggregation • we know we will have to add columns to it relatively dynamically (user request) unfortunately up until now I had not registered this requirement from the documentation:

All metrics must have aggregation configs.

I feel like it is at the heart of what we are seeing now; in essence, you can't update the schema with a new metric, as the API says:

Copy code

PUT schema response: {'code': 400, 'error': 'Invalid schema: staging_stream_st_mknaw_idle_worker_test14_sg_12. Reason: Schema is incompatible with tableConfig with name: staging_stream_st_mknaw_idle_worker_test14_sg_12_REALTIME and type: REALTIME'}

and, probably correctly, the other way around, trying to get the table update in before the schema update, also does not work

Copy code

PUT table response: {'code': 400, 'error': "Invalid table config: staging_stream_st_mknaw_idle_worker_test14_sg_12 with error: The destination column 'mtr_clicks_sum' of the aggregation function must be present in the schema"}

so... is the implication that a pinot schema/table pair that has ingestion aggregation can.. never evolve? this would be unfortunate.

Gerald Bonfiglio

10/30/2025, 5:08 PM

Hi Everyone, We have a use case where we want to write a Java Map object onto a pinot table. We have tried both writing it as JSON and flattening the map, creating a separate column for each map key. Using tables with a few million rows, when testing query performance, we noticed that using separate columns is much more performant than using a JSON column for the map, so we are proceeding with flatting out the map. As you would expect, since the map key maps to a specific table column which is of a specific data type, all values for the same key in different records have to be the same data type. However, we do have situation where the same key can be one of several data types. Was wondering if anyone else had a similar use case, and if they found a solution that works. One solution that comes to mind is to create columns based on both the key and the data type, so that if they key appeared as both a string and a long, there would be 2 columns, s_key and l_key. Seems pretty straight forward, but it complicates queries, in that we need to know what columns we have created and query against all of them (could be more than just these 2).

11/04/2025, 10:44 AM

Hi all, Any reasons for still using jdk11 to build Pinot? Do we have plans to got to jdk 21 or 25 ? We then get some performance gains eg. around GC and concurrency stablity. Do we have plans for that soon?

Arnav

11/11/2025, 4:25 AM

Hi team, i have 3 minion instances running and below are the configs:

Copy code

"task": {
      "taskTypeConfigsMap": {
        "UpsertCompactionTask": {
          "schedule": "0 0 */4 ? * *",
          "bufferTimePeriod": "1h",
          "invalidRecordsThresholdPercent": "0",
          "invalidRecordsThresholdCount": "1",
          "validDocIdsType": "SNAPSHOT"
        }
      }
    },

Its taking too much time how can i optimise it?

Satya Mahesh

11/12/2025, 10:22 AM

Hi team, pls help and tell the solution. this is the blocker to my work I added the upsert configuration to the existing setup, but it didn’t work initially. After deleting the old configuration and re-adding the same setup with upserts, it started working. However, after some time, the segments began failing. { "REALTIME": { "tableName": "views_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "views", "replication": "1", "retentionTimeUnit": "DAYS", "retentionTimeValue": "90", "replicasPerPartition": "1", "timeColumnName": "view_end", "minimizeDataMovement": false }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "tableIndexConfig": { "aggregateMetrics": false, "starTreeIndexConfigs": [], "enableDefaultStarTree": false, "nullHandlingEnabled": false, "noDictionaryColumns": [ "events" ], "invertedIndexColumns": [ "workspace_id", "country_code", "fp_playback_id", "browser_name", "is_final" ], "bloomFilterColumns": [], "onHeapDictionaryColumns": [], "rangeIndexColumns": [ "view_end", "view_start", "created_at" ], "sortedColumn": [ "view_end", "quality_of_experience_score", "playback_score", "render_quality_score", "stability_score", "startup_score" ], "varLengthDictionaryColumns": [], "rangeIndexVersion": 2, "optimizeDictionaryForMetrics": false, "optimizeDictionary": false, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "loadMode": "MMAP", "enableDynamicStarTreeCreation": true, "columnMajorSegmentBuilderEnabled": true, "noDictionarySizeRatioThreshold": 0.85 }, "metadata": {}, "quota": {}, "task": { "taskTypeConfigsMap": { "UpsertCompactionTask": { "schedule": "0 0 * ? * *", "bufferTimePeriod": "1h", "invalidRecordsThresholdPercent": "30", "invalidRecordsThresholdCount": "100000", "tableMaxNumTasks": "10", "validDocIdsType": "SNAPSHOT" } } }, "routing": { "segmentPrunerTypes": [ "partition" ], "instanceSelectorType": "strictReplicaGroup" }, "query": {}, "upsertConfig": { "enableSnapshot": true, "deletedKeysTTL": 0, "mode": "FULL", "comparisonColumns": [ "view_end" ], "metadataTTL": 0, "dropOutOfOrderRecord": false, "hashFunction": "NONE", "defaultPartialUpsertStrategy": "OVERWRITE", "enablePreload": true, "consistencyMode": "NONE", "upsertViewRefreshIntervalMs": 3000, "allowPartialUpsertConsumptionDuringCommit": false }, "ingestionConfig": { "transformConfigs": [ { "columnName": "created_at", "transformFunction": "Now()" } ], "streamIngestionConfig": { "streamConfigMaps": [ { "streamType": "kafka", "stream.kafka.topic.name": "fp-data-processed-views-v1", "stream.kafka.consumer.prop.group.id": "pinot-views", "stream.kafka.broker.list": "kafka-cluster-broker-0.kafka-cluster-kafka-brokers.prod-kafka.svc.cluster.local9092,kafka cluster broker 1.kafka cluster kafka brokers.prod kafka.svc.cluster.local9092,kafka-cluster-broker-2.kafka-cluster-kafka-brokers.prod-kafka.svc.cluster.local:9092", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "largest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "sasl.mechanism": "SCRAM-SHA-512", "security.protocol": "SASL_PLAINTEXT", "sasl.jaas.config": "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"hfjcnfrjnrc\" password=\"njffhirjfkriviruuir\";", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.segment.size": "200M", "realtime.segment.flush.threshold.time": "24h" } ], "columnMajorSegmentBuilderEnabled": true, "trackFilteredMessageOffsets": false }, "continueOnError": false, "rowTimeValueCheck": false, "segmentTimeValueCheck": true }, "isDimTable": false } }

RANJITH KUMAR

11/14/2025, 11:05 AM

Hi Team, For Hybrid table I see only example with merge roll up example only-https://github.com/apache/pinot/tree/master/pinot-tools/src/main/resources/examples/minions/stream/githubEvents Will pinot support batch ingestion for OFFLINE table for HYBRID tables. When I start adding batchConfigMaps to load from blob storage its not working to create offline table with same name as realtime table.Can some help me with this !! Context: Want to do backfill for sales table we already have realtime sales table , but loading all history data is challenge via kafka so planning to have offline table with same name want to load that offline table with batch ingestion via minions seems its not supported to create table with batchConfigMaps in offline table for hybrid table

Suresh PERUML

11/14/2025, 3:57 PM

Hi All, I am using pinot libraries to load a segment and modify the value of a column in a table to add additional encrypted data for other workflow purpose. As part of this task, i am using pinot libraries and its API's. Once the column values are modified, recreate the segment as csv file format and convert it back to segments again. The updated segments, i would be using PINOT REST APIs to upload the same in PINOT DB. Below are the libraries used. implementation 'org.apache.pinotpinot common1.3.0' implementation 'org.apache.pinotpinot segment spi1.3.0' implementation 'org.apache.pinotpinot segment local1.3.0' implementation 'org.apache.pinotpinot core1.3.0' implementation 'org.apache.pinotpinot spi1.3.0' implementation("org.reflectionsreflections0.10.2") IndexSegment indexSegment = ImmutableSegmentLoader.load(segmentFileToLoad, ReadMode.mmap; / IndexSegment indexSegment = ImmutableSegmentLoader.load(segmentFileToLoad, ReadMode.heap); For the above API has been invoked from Spring 6, SPRING Boot Microservices Simple Test Program. The output of above method is not returning the indexSegment. Further debugged the pinot code, i am getting the error in below lines... pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/PinotDataBuffer.java. /** * Memory maps a file into a buffer. * pNOTE: If the file gets extended, the contents of the extended portion of the file are not defined. */ public static *PinotDataBuffer mapFile*(File file, boolean readOnly, long offset, long size, ByteOrder byteOrder, /** * Allocates a buffer using direct memory and loads a file into the buffer. */ public static PinotDataBuffer loadFile(File file, long offset, long size, ByteOrder byteOrder, @Nullable String description) throws IOException { PinotDataBuffer buffer; Above line is invoked from "*pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/SingleFileIndexDirectory.java*" Code is stuck in loadFile(...), mapFile(....) methods. private void mapAndSliceFile(SortedMapLong, IndexEntry startOffsets, ListLong offsetAccum, long endOffset) throws IOException { Preconditions.checkNotNull(startOffsets); Preconditions.checkNotNull(offsetAccum); Preconditions.checkArgument(!offsetAccum.isEmpty()); long fromFilePos = offsetAccum.get(0); long size = endOffset - fromFilePos; String context = allocationContext(_indexFile, "single_file_index.rw." + "." + String.valueOf(fromFilePos) + "." + String.valueOf(size)); // Backward-compatible: index file is always big-endian PinotDataBuffer buffer; if (_readMode == ReadMode.heap) { buffer = *PinotDataBuffer.loadFile(_*indexFile, fromFilePos, size, ByteOrder.BIG_ENDIAN, context); } else { buffer = *PinotDataBuffer.mapFile*(_indexFile, true, fromFilePos, size, ByteOrder.BIG_ENDIAN, context); } The same program works fine in Spring framework 3, 4 and only spring framework 6, spring 3 it is not working. Need some inputs on spring framework 6 the same jars with API's are not working? Added below add-opens JVM as well in spring framework 6.... --add-opens java.base/java.lang=ALL-UNNAMED \ --add-opens java.base/java.util=ALL-UNNAMED \ --add-opens java.base/java.nio=ALL-UNNAMED \ --add-opens java.base/java.io=ALL-UNNAMED \ --add-opens java.base/java.security=ALL-UNNAMED \ --add-opens java.base/sun.nio.ch=ALL-UNNAMED \ --add-opens java.base/java.lang.reflect=ALL-UNNAMED \ --add-exports java.base/jdk.internal.misc=ALL-UNNAMED

Xiang Fu

11/15/2025, 8:58 AM

Here is the new slack invite link if anyone want to use: https://inviter.co/apache-pinot The old communityInviter link is invalid.

Arnav

11/17/2025, 6:10 AM

Hi team, in 1.4.0 enableSnapshot and enablePreload are deprecated and replaced with snapshot and preload?

Qosimjon Mamatqulov

11/18/2025, 10:51 AM

👋 Hello, team!

👋 2

San Kumar

11/20/2025, 11:20 AM

Hello TEAM In our production we are using SET useMultistageEngine = True and because of that we are getting stability issue of cluster with below error HelixManager is not connected ZK session expired can you advice below query really required useMultistageEngine = True SELECT toEpochSeconds(DATETRUNC('hour', event_time)) AS event_time_hours, state, COUNT(1) as state_count FROM event_details WHERE event_time >= %(start_time) AND oevent_time < %(end_time) GROUP BY toEpochSeconds(DATETRUNC('hour', event_time)), state OPTION(timeoutMs=300000)

Eric Wohlstadter

11/20/2025, 9:43 PM

Hi all, I am interested in adding some User Defined Aggregation Functions (https://docs.pinot.apache.org/developers/developers-and-contributors/extending-pinot/custom-aggregation-function). It looks like you need to recompile from source, rather than drop them in as a jar. "As of today, this requires code change in Pinot but we plan to add the ability to plugin Functions without having to change Pinot code." Does anyone know if work is going on upstream to fix this? If anyone has a ticket they can point me to, that would be awesome. I didn't see anything.

Arnav

11/24/2025, 6:26 AM

Hi team, I am using below configs, i have 36 partitions and every 1-1.5hr consuming segments are flushed so around 800 new segments. Now initial partitions segments are properly merged and but later partitions are not picked by UpsertCompactMergeTask task. What changes should i do so that every partitions are picked?

Copy code

"task": {
      "taskTypeConfigsMap": {
        "UpsertCompactionTask": {
          "schedule": "0 */5 * ? * *",
          "bufferTimePeriod": "1h",
          "invalidRecordsThresholdCount": "1",
          "tableMaxNumTasks": "40",
          "validDocIdsType": "SNAPSHOT"
        },
        "UpsertCompactMergeTask": {
          "schedule": "0 0 */1 ? * *",
          "bufferTimePeriod": "1m",
          "maxNumSegmentsPerTask": "100",
          "maxNumRecordsPerSegment": "50000000"
        }
      }
    }

Prateek Garg

11/24/2025, 12:01 PM

Hi team, I need to add Pinot to Trino Catalog. I have a Pinot cluster running on bare metal machines such that there are multiple instances of Pinot Server running per node as systemd services, with different gRPC ports for each server instance. According to Trino's official docs - we can only define a single port in property

pinot.grpc.port

I'd appreciate clarification on the following points: • Does the gRPC port mentioned in Trino documentation refer to Pinot Server's gRPC port, or Broker's gRPC API port? • If it refers to Pinot Server's port, do we have any mechanism to make it work for our configuration, where there are multiple server instances per machine having different gRPC ports? Documents for Reference: https://trino.io/docs/current/connector/pinot.html#grpc-configuration-properties https://docs.pinot.apache.org/users/api/broker-grpc-api

RANJITH KUMAR

11/26/2025, 8:52 AM

How to capacity plan read QPS for pinot ? Is it Servers *cores = no of queries ? Can someone guide !!

RANJITH KUMAR

11/26/2025, 8:54 AM

@Xiang Fu

Arnav

12/01/2025, 5:59 AM

Hi team, in there any way to Query using Cursors with trino?

Arnav

12/01/2025, 9:19 AM

Hi team, i have thousands of users for which i need to apply rls, so is there any dynamic way , or i have to onboard the users and add them in broker.conf? pinot.broker.access.control.principals.alice.password=alice123 pinot.broker.access.control.principals.admin.password=admin pinot.broker.access.control.principals.alice.tables=user_data_REALTIME pinot.broker.access.control.principals.alice.user_data_REALTIME.rls=userId='102'

Senthil Kumar

12/01/2025, 12:45 PM

Hi Team I see DropwizardJmxReporter and YammerJmxReporter in apache pinot; Could you please let me know which is default one and how can we change to other one like DropwizardJmxReporter if YammerJmxReporter is default one.

RANJITH KUMAR

12/02/2025, 6:41 PM

Hi team, How can we overwrite OFFLINE tables?