https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • n

    Nicolas

    09/24/2025, 2:46 PM
    Hi everyone, Would like to know if it's possible to configure a real-time table, consuming from 2 different Kafka clusters ?
    r
    q
    • 3
    • 14
  • m

    mg

    09/29/2025, 8:39 AM
    Hi all, The Pinot Controller UI showes all tables configurations including SSL configs. is it possible to hide or mask sensitive info from the UI such as kafka truststore and keystore passwords?
    Copy code
    ...,
        "tableIndexConfig": {
          "streamConfigs": {
            "security.protocol": "SSL",
            "ssl.truststore.location": "/opt/pinot/kafka-cert-jks/truststore.jks",
            "ssl.truststore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
            "ssl.truststore.type": "JKS",
            "ssl.keystore.location": "/opt/pinot/kafka-cert-jks/keystore.jks",
            "ssl.keystore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
            "ssl.keystore.type": "JKS",
            "ssl.key.password": "P6cz00RPASSWORDPLAINTEXT006OTF5"
  • s

    Sankaranarayanan Viswanathan

    09/29/2025, 5:57 PM
    Hello Everyone, wondering if I can get some guidance on something I am working on. I am storing events in a pinot table and we have a modified retention manager to delete segments based on the min and max values of an expiry date column on this table that is populated at ingestion time. Each event row in the pinot table is also associated with some external objects stored in S3 and we use the pinot table as source of truth. When a pinot segment goes out of retention we would like to delete those related objects in S3. Are there patterns on how to accomplish this?
    m
    r
    • 3
    • 8
  • b

    Brook E

    09/30/2025, 3:29 PM
    Does anyone have any good strategies for how they automatically toggle data from real-time to offline?
    r
    m
    m
    • 4
    • 14
  • m

    magax90515

    10/05/2025, 11:08 AM
    Will
    org.apache.pinot:pinot-common:1.4.0
    be published to maven?
    org.apache.pinot:pinot-java-client:1.4.0
    has been published, but it depends on pinot-common which has not been published.
  • y

    Yeshwanth

    10/07/2025, 7:10 AM
    Hi everyone, We are running a large-scale Pinot deployment and plan to store all our data in a single, large table. As our segment count grows into the hundreds of thousands, we are already hitting the ZooKeeper ZNode size limit (
    jute.maxbuffer
    ) due to the large segment metadata. We have reviewed the official troubleshooting documentation, which suggests two primary solutions: 1. Decrease the number of segments: We cannot use rollups or further merge segments, as our current segment size is already optimized at ~300MB, and we need to maintain data granularity for our query performance. 2. Increase `jute.maxbuffer`: We view this as a last resort, as we are concerned about the potential downstream performance impacts on the ZooKeeper cluster. Given these constraints, we have a few questions: • What are the recommended strategies for managing ZNode size in a table with a very high segment count, beyond the two options mentioned above? • Is there a practical or theoretical upper limit on the number of segments a single Pinot table can efficiently handle before ZK performance degrades? • Are there alternative configurations or architectural approaches we should consider for this scenario?
    m
    • 2
    • 1
  • g

    Gerald Bonfiglio

    10/07/2025, 6:59 PM
    Hey everyone, We want to use the JDBC Grpc Client that was introduced in 1.4.0, but getting error building from maven:
    Copy code
    Failed to collect dependencies at org.apache.pinot:pinot-jdbc-client:jar:1.4.0:
    Failed to read artifact descriptor for org.apache.pinot:pinot-jdbc-client:jar:1.4.0: The following artifacts could not be resolved: org.apache.pinot:pinot:pom:1.4.0 (absent)
    Checking in Maven Central, pinot-1.4.0 doesn't seem to be there. Are their plans for pushing the remaining 1.4.0 jars to Maven Central? Are we missing something else?
    y
    q
    +2
    • 5
    • 19
  • r

    robert zych

    10/09/2025, 4:02 PM
    https://www.meetup.com/apache-pinot/events/311444779/
    apache pinot crimson 1
    👍 1
  • m

    mg

    10/10/2025, 8:13 AM
    Hi everyone, Any plans to move away from Bitnami zookeeper as a dependency in Pinot Helm chart?
    y
    • 2
    • 3
  • s

    Shubham Kumar

    10/10/2025, 9:59 AM
    Hi team, My current primary key count is around 100 million. Whenever I restart the server, the primary key count increases to around 260 million and then drops back to 100 million. Could you please help me understand why this behavior occurs?
    m
    k
    • 3
    • 21
  • a

    Arnav

    10/13/2025, 9:04 AM
    Hi team, TOTAL_KEYS_MARKED_FOR_DELETION is a meter metric, means it will reset when server restarts and also its _Count gives cumulative value. So is there any way to get the exact keys marked for deletion for a time frame like last 2hrs or so?
  • r

    RANJITH KUMAR

    10/13/2025, 3:26 PM
    Hi Team, What is API that we can use to get all tasks running associated to OFFLINE Table configured with minion job tasks. I am able to get the task list and able to get config details for task name but how can we do it with OFFLINE TABLE Name how to get task name list ? Also even after deleting the OFFLINE Table from UI pinot controller task segments are running in background and also not even able to stop and delete them , facing these errors Method Not Allowed and Server error '500 Internal Server Error' for url 'http://pinot-controller:9000/tasks/task/Task_SegmentGenerationAndPushTask_199db571-e077-4d74-86e6-f380de37ea51_1760096118944' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500 respectively.
    m
    s
    • 3
    • 5
  • r

    RANJITH KUMAR

    10/14/2025, 1:58 PM
    Hi Team, Need general recommendation how to make sure minion tasks gets completed quickly ideally within 15 mins lets assumes we have 10x GBs of data for 20+ tables. What is hardware we need to scale to make minion tasks complete fastly?
    m
    m
    • 3
    • 2
  • p

    Paulc

    10/21/2025, 8:57 AM
    a
  • s

    Shubham Kumar

    10/22/2025, 4:19 AM
    Hi Team, Could you please share a sample logical table configuration? I’m currently using the configuration below for my logical table, but it only fetches data from the realtime table the offline table data is not being fetched.
    Copy code
    {
      "tableName": "logicalTable",
      "physicalTableConfigMap": {
        "user_stream_REALTIME": {},
        "user_batch_OFFLINE": {}
      },
      "refOfflineTableName": "user_batch_OFFLINE",
      "refRealtimeTableName": "user_stream_REALTIME",
      "brokerTenant": "DefaultTenant",
      "timeBoundaryConfig": {
        "boundaryStrategy": "min",
        "parameters": {
          "function": "min"
        }
      }
    }
    m
    • 2
    • 14
  • x

    Xiang Fu

    10/23/2025, 12:13 AM
    The next Pinot contributors call will happen tomorrow 8:30AM PDT.
    🚀 2
  • a

    Arnav

    10/28/2025, 9:46 AM
    Hi team, Anyone can please explain me, Query 1 taking around 120 secs whereas Query 2 taking 15-20 secs to give same result. Total Docs in table 6Billion and my table is RT table Is it because in Query2, all 3 queries are computed parallely? or in first query segments are loaded to memory and then other two query takes very less time hence overall less query time Query 1:
    Copy code
    SELECT * FROM table
      WHERE customer_id = 1234
        AND msisdn IN ( ..1000 msisdns)
    Query 2:
    Copy code
    SELECT * FROM table
      WHERE customer_id = 1234
        AND msisdn IN ( ..350 msisdns)
      UNION ALL
      SELECT * FROM append_iot_session_events
      WHERE customer_id = 1234
        AND msisdn IN (..350 msisdns)
      UNION ALL
      SELECT * FROM append_iot_session_events
      WHERE customer_id = 1234
        AND msisdn IN (..300 msisdns)
    m
    g
    y
    • 4
    • 29
  • r

    robert zych

    10/29/2025, 2:44 PM
    The next Pinot Contributor call is scheduled for next Tuesday 8:30AM pacific https://www.meetup.com/apache-pinot/events/311759314/?slug=apache-pinot&eventId=311759314 Slack Conversation
  • m

    Matt Nawara

    10/30/2025, 12:26 PM
    Hi all, we have a usecase where • we have a table with a metric that is sourced from an ingestion aggregation • we know we will have to add columns to it relatively dynamically (user request) unfortunately up until now I had not registered this requirement from the documentation:
    All metrics must have aggregation configs.
    I feel like it is at the heart of what we are seeing now; in essence, you can't update the schema with a new metric, as the API says:
    Copy code
    PUT schema response: {'code': 400, 'error': 'Invalid schema: staging_stream_st_mknaw_idle_worker_test14_sg_12. Reason: Schema is incompatible with tableConfig with name: staging_stream_st_mknaw_idle_worker_test14_sg_12_REALTIME and type: REALTIME'}
    and, probably correctly, the other way around, trying to get the table update in before the schema update, also does not work
    Copy code
    PUT table response: {'code': 400, 'error': "Invalid table config: staging_stream_st_mknaw_idle_worker_test14_sg_12 with error: The destination column 'mtr_clicks_sum' of the aggregation function must be present in the schema"}
    so... is the implication that a pinot schema/table pair that has ingestion aggregation can.. never evolve? this would be unfortunate.
  • g

    Gerald Bonfiglio

    10/30/2025, 5:08 PM
    Hi Everyone, We have a use case where we want to write a Java Map object onto a pinot table. We have tried both writing it as JSON and flattening the map, creating a separate column for each map key. Using tables with a few million rows, when testing query performance, we noticed that using separate columns is much more performant than using a JSON column for the map, so we are proceeding with flatting out the map. As you would expect, since the map key maps to a specific table column which is of a specific data type, all values for the same key in different records have to be the same data type. However, we do have situation where the same key can be one of several data types. Was wondering if anyone else had a similar use case, and if they found a solution that works. One solution that comes to mind is to create columns based on both the key and the data type, so that if they key appeared as both a string and a long, there would be 2 columns, s_key and l_key. Seems pretty straight forward, but it complicates queries, in that we need to know what columns we have created and query against all of them (could be more than just these 2).
    m
    • 2
    • 3
  • m

    mg

    11/04/2025, 10:44 AM
    Hi all, Any reasons for still using jdk11 to build Pinot? Do we have plans to got to jdk 21 or 25 ? We then get some performance gains eg. around GC and concurrency stablity. Do we have plans for that soon?
    m
    g
    • 3
    • 4
  • a

    Arnav

    11/11/2025, 4:25 AM
    Hi team, i have 3 minion instances running and below are the configs:
    Copy code
    "task": {
          "taskTypeConfigsMap": {
            "UpsertCompactionTask": {
              "schedule": "0 0 */4 ? * *",
              "bufferTimePeriod": "1h",
              "invalidRecordsThresholdPercent": "0",
              "invalidRecordsThresholdCount": "1",
              "validDocIdsType": "SNAPSHOT"
            }
          }
        },
    Its taking too much time how can i optimise it?
    r
    • 2
    • 8
  • s

    Satya Mahesh

    11/12/2025, 10:22 AM
    Hi team, pls help and tell the solution. this is the blocker to my work I added the upsert configuration to the existing setup, but it didn’t work initially. After deleting the old configuration and re-adding the same setup with upserts, it started working. However, after some time, the segments began failing. { "REALTIME": { "tableName": "views_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "views", "replication": "1", "retentionTimeUnit": "DAYS", "retentionTimeValue": "90", "replicasPerPartition": "1", "timeColumnName": "view_end", "minimizeDataMovement": false }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "tableIndexConfig": { "aggregateMetrics": false, "starTreeIndexConfigs": [], "enableDefaultStarTree": false, "nullHandlingEnabled": false, "noDictionaryColumns": [ "events" ], "invertedIndexColumns": [ "workspace_id", "country_code", "fp_playback_id", "browser_name", "is_final" ], "bloomFilterColumns": [], "onHeapDictionaryColumns": [], "rangeIndexColumns": [ "view_end", "view_start", "created_at" ], "sortedColumn": [ "view_end", "quality_of_experience_score", "playback_score", "render_quality_score", "stability_score", "startup_score" ], "varLengthDictionaryColumns": [], "rangeIndexVersion": 2, "optimizeDictionaryForMetrics": false, "optimizeDictionary": false, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "loadMode": "MMAP", "enableDynamicStarTreeCreation": true, "columnMajorSegmentBuilderEnabled": true, "noDictionarySizeRatioThreshold": 0.85 }, "metadata": {}, "quota": {}, "task": { "taskTypeConfigsMap": { "UpsertCompactionTask": { "schedule": "0 0 * ? * *", "bufferTimePeriod": "1h", "invalidRecordsThresholdPercent": "30", "invalidRecordsThresholdCount": "100000", "tableMaxNumTasks": "10", "validDocIdsType": "SNAPSHOT" } } }, "routing": { "segmentPrunerTypes": [ "partition" ], "instanceSelectorType": "strictReplicaGroup" }, "query": {}, "upsertConfig": { "enableSnapshot": true, "deletedKeysTTL": 0, "mode": "FULL", "comparisonColumns": [ "view_end" ], "metadataTTL": 0, "dropOutOfOrderRecord": false, "hashFunction": "NONE", "defaultPartialUpsertStrategy": "OVERWRITE", "enablePreload": true, "consistencyMode": "NONE", "upsertViewRefreshIntervalMs": 3000, "allowPartialUpsertConsumptionDuringCommit": false }, "ingestionConfig": { "transformConfigs": [ { "columnName": "created_at", "transformFunction": "Now()" } ], "streamIngestionConfig": { "streamConfigMaps": [ { "streamType": "kafka", "stream.kafka.topic.name": "fp-data-processed-views-v1", "stream.kafka.consumer.prop.group.id": "pinot-views", "stream.kafka.broker.list": "kafka-cluster-broker-0.kafka-cluster-kafka-brokers.prod-kafka.svc.cluster.local9092,kafka cluster broker 1.kafka cluster kafka brokers.prod kafka.svc.cluster.local9092,kafka-cluster-broker-2.kafka-cluster-kafka-brokers.prod-kafka.svc.cluster.local:9092", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "largest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "sasl.mechanism": "SCRAM-SHA-512", "security.protocol": "SASL_PLAINTEXT", "sasl.jaas.config": "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"hfjcnfrjnrc\" password=\"njffhirjfkriviruuir\";", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.segment.size": "200M", "realtime.segment.flush.threshold.time": "24h" } ], "columnMajorSegmentBuilderEnabled": true, "trackFilteredMessageOffsets": false }, "continueOnError": false, "rowTimeValueCheck": false, "segmentTimeValueCheck": true }, "isDimTable": false } }
    x
    • 2
    • 3
  • r

    RANJITH KUMAR

    11/14/2025, 11:05 AM
    Hi Team, For Hybrid table I see only example with merge roll up example only-https://github.com/apache/pinot/tree/master/pinot-tools/src/main/resources/examples/minions/stream/githubEvents Will pinot support batch ingestion for OFFLINE table for HYBRID tables. When I start adding batchConfigMaps to load from blob storage its not working to create offline table with same name as realtime table.Can some help me with this !! Context: Want to do backfill for sales table we already have realtime sales table , but loading all history data is challenge via kafka so planning to have offline table with same name want to load that offline table with batch ingestion via minions seems its not supported to create table with batchConfigMaps in offline table for hybrid table
    x
    • 2
    • 2
  • s

    Suresh PERUML

    11/14/2025, 3:57 PM
    Hi All, I am using pinot libraries to load a segment and modify the value of a column in a table to add additional encrypted data for other workflow purpose. As part of this task, i am using pinot libraries and its API's. Once the column values are modified, recreate the segment as csv file format and convert it back to segments again. The updated segments, i would be using PINOT REST APIs to upload the same in PINOT DB. Below are the libraries used. implementation 'org.apache.pinotpinot common1.3.0' implementation 'org.apache.pinotpinot segment spi1.3.0' implementation 'org.apache.pinotpinot segment local1.3.0' implementation 'org.apache.pinotpinot core1.3.0' implementation 'org.apache.pinotpinot spi1.3.0' implementation("org.reflectionsreflections0.10.2") IndexSegment indexSegment = ImmutableSegmentLoader.load(segmentFileToLoad, ReadMode.mmap; / IndexSegment indexSegment = ImmutableSegmentLoader.load(segmentFileToLoad, ReadMode.heap); For the above API has been invoked from Spring 6, SPRING Boot Microservices Simple Test Program. The output of above method is not returning the indexSegment. Further debugged the pinot code, i am getting the error in below lines... pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/PinotDataBuffer.java. /** * Memory maps a file into a buffer. * pNOTE: If the file gets extended, the contents of the extended portion of the file are not defined. */ public static *PinotDataBuffer mapFile*(File file, boolean readOnly, long offset, long size, ByteOrder byteOrder, /** * Allocates a buffer using direct memory and loads a file into the buffer. */ public static PinotDataBuffer loadFile(File file, long offset, long size, ByteOrder byteOrder, @Nullable String description) throws IOException { PinotDataBuffer buffer; Above line is invoked from "*pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/SingleFileIndexDirectory.java*" Code is stuck in loadFile(...), mapFile(....) methods. private void mapAndSliceFile(SortedMapLong, IndexEntry startOffsets, ListLong offsetAccum, long endOffset) throws IOException { Preconditions.checkNotNull(startOffsets); Preconditions.checkNotNull(offsetAccum); Preconditions.checkArgument(!offsetAccum.isEmpty()); long fromFilePos = offsetAccum.get(0); long size = endOffset - fromFilePos; String context = allocationContext(_indexFile, "single_file_index.rw." + "." + String.valueOf(fromFilePos) + "." + String.valueOf(size)); // Backward-compatible: index file is always big-endian PinotDataBuffer buffer; if (_readMode == ReadMode.heap) { buffer = *PinotDataBuffer.loadFile(_*indexFile, fromFilePos, size, ByteOrder.BIG_ENDIAN, context); } else { buffer = *PinotDataBuffer.mapFile*(_indexFile, true, fromFilePos, size, ByteOrder.BIG_ENDIAN, context); } The same program works fine in Spring framework 3, 4 and only spring framework 6, spring 3 it is not working. Need some inputs on spring framework 6 the same jars with API's are not working? Added below add-opens JVM as well in spring framework 6.... --add-opens java.base/java.lang=ALL-UNNAMED \ --add-opens java.base/java.util=ALL-UNNAMED \ --add-opens java.base/java.nio=ALL-UNNAMED \ --add-opens java.base/java.io=ALL-UNNAMED \ --add-opens java.base/java.security=ALL-UNNAMED \ --add-opens java.base/sun.nio.ch=ALL-UNNAMED \ --add-opens java.base/java.lang.reflect=ALL-UNNAMED \ --add-exports java.base/jdk.internal.misc=ALL-UNNAMED
    x
    • 2
    • 7
  • x

    Xiang Fu

    11/15/2025, 8:58 AM
    Here is the new slack invite link if anyone want to use: https://inviter.co/apache-pinot The old communityInviter link is invalid.
  • a

    Arnav

    11/17/2025, 6:10 AM
    Hi team, in 1.4.0 enableSnapshot and enablePreload are deprecated and replaced with snapshot and preload?
    m
    j
    • 3
    • 7
  • q

    Qosimjon Mamatqulov

    11/18/2025, 10:51 AM
    👋 Hello, team!
    👋 1
  • s

    San Kumar

    11/20/2025, 11:20 AM
    Hello TEAM In our production we are using SET useMultistageEngine = True and because of that we are getting stability issue of cluster with below error HelixManager is not connected ZK session expired can you advice below query really required useMultistageEngine = True SELECT toEpochSeconds(DATETRUNC('hour', event_time)) AS event_time_hours, state, COUNT(1) as state_count FROM event_details WHERE event_time >= %(start_time) AND oevent_time < %(end_time) GROUP BY toEpochSeconds(DATETRUNC('hour', event_time)), state OPTION(timeoutMs=300000)
    m
    • 2
    • 4
  • e

    Eric Wohlstadter

    11/20/2025, 9:43 PM
    Hi all, I am interested in adding some User Defined Aggregation Functions (https://docs.pinot.apache.org/developers/developers-and-contributors/extending-pinot/custom-aggregation-function). It looks like you need to recompile from source, rather than drop them in as a jar. "As of today, this requires code change in Pinot but we plan to add the ability to plugin Functions without having to change Pinot code." Does anyone know if work is going on upstream to fix this? If anyone has a ticket they can point me to, that would be awesome. I didn't see anything.
    m
    • 2
    • 4