https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • a

    Abdulaziz Alqahtani

    09/01/2025, 7:17 PM
    Hi team, we have a multi-tenant hybrid table where each row has a
    tenant_id
    (ULID). The column is low cardinality, and most queries include a
    tenant_id
    predicate. What’s the best way to index this column?
    m
    • 2
    • 12
  • c

    cesho

    09/04/2025, 2:16 PM
    Can someone explain how Apache Pinot integrates with Confluent Schema Registry during Kafka stream ingestion? Specifically: 1. Does Pinot use Schema Registry only for deserialization of Avro/Protobuf messages, or can it automatically generate Pinot table schemas from the registered schemas? 2. If auto-generation is supported, what are the limitations or required configurations? 3. How does Pinot handle schema evolution in Schema Registry (e.g., backward/forward compatibility) during ingestion? 4. Are there any best practices for defining Pinot schemas when using Schema Registry to avoid data type mismatches? Context: I’m setting up real-time ingestion from Kafka topics with Avro schemas stored in Schema Registry and want to minimize manual schema mapping work.
    m
    m
    • 3
    • 3
  • a

    Abdulaziz Alqahtani

    09/07/2025, 8:34 PM
    Hi team, What’s the recommended approach for one-off batch ingestion of data from S3 into Pinot, Minion-based ingestion vs standalone ingestion? For context: • I currently have a real-time table. • I want to import historical data into a separate offline table. • My source data is in PostgreSQL, and I can export and chunk it into S3 first.
    m
    • 2
    • 1
  • m

    mg

    09/08/2025, 8:09 PM
    Hi Team, I'm running into an issue with the Pinot Controller UI and the Swagger REST API when using an NGINX Ingress with a subpath. I'm hoping someone has encountered this and can help. Here goes the problem summary: I've configured my ingress to expose the Pinot Controller at
    <https://example.com/pinot/>
    . The main UI works fine and most links are correctly routed. Those that works open on
    <https://example.com/pinot/#/>...
    However, the Swagger REST API UI link is not. Swagger API button, it tries to access
    <https://example.com/help>
    instead of
    <https://example.com/pinot/help>
    , resulting in a 404 Not Found error. I don't see an obvious way to enforce the swagger link subpath to something other than (/) ? I am using helm, and I have been looking for different options in https://github.com/apache/pinot/blob/master/helm/pinot/README.md but nothing worked.. thanks in advance..
    m
    x
    • 3
    • 3
  • s

    Soon

    09/11/2025, 5:19 PM
    Hello team! I had a quick question if query plan shows
    FILTER_SORTED_INDEX
    would it be the same as using
    FILTER_INVERTED_INDEX
    like sorted inverted index?
    m
    r
    x
    • 4
    • 6
  • i

    Indira Vashisth

    09/15/2025, 9:57 AM
    Hi team, i triggered server rebalance in my pinot cluster with 3 servers, but the segment reassignment shows the target server for all the segments as only one server. How can i make it assign the data to all 3 servers.
    m
    y
    • 3
    • 4
  • i

    Indira Vashisth

    09/15/2025, 10:02 AM
    Also what is the recommended size of data we should be storing per server? We will need to store more than 150TB of data and hit this data with complex queries including distinct, json match and sorting.
  • t

    Trust Okoroego

    09/17/2025, 4:49 PM
    Possible Bug in Pinot LAG window function. (1.2.0)
    Copy code
    select
    	  ORDER_ID,
    	  ORDER_NUMBER,
    	  CUSTORDER_ID,
    	  ORDER_VALIDATION_CODE,
    	  POD_CODE,
    	  DELIVERY_FROM_DAT,
    	  DELIVERY_TO_DAT,
    	  CTL_CRE_TS,
    	  CTL_MOD_TS,
    	  ORDER_STATUS_CD,
    	  SAREA_ID,
    	    LAG(ON_HOLD_ORDER_AND_LOCKED_FLAG, 1, 0) OVER (PARTITION BY ORDER_ID ORDER BY CTL_MOD_TS) AS prev_is_active
    		from
    		Orders
      )
    If default is not set, the result return correctly with the last row returning a NULL for prev_is_active since no row before it. However setting the default of 0 throws an unrelated timestamp error. Could this be related to NULL handling?
    Copy code
    at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
    	at org.apache.pinot.query.service.server.QueryServer.submit(QueryServer.java:156)
    	at org.apache.pinot.common.proto.PinotQueryWorkerGrpc$MethodHandlers.invoke(PinotQueryWorkerGrpc.java:284)
    ...
    Caused by: java.lang.RuntimeException: Caught exception while submitting request: 1473823763000000159, stage: 2
    	at org.apache.pinot.query.service.server.QueryServer.lambda$submit$1(QueryServer.java:144)
    	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
    	... 3 more
    Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed to instantiate WindowFunction for function: LAG
    	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
    	at org.apache.pinot.query.service.server.QueryServer.lambda$submit$1(QueryServer.java:141)
    	... 4 more
    ...
    Caused by: java.lang.RuntimeException: Failed to instantiate WindowFunction for function: LAG
    	at org.apache.pinot.query.runtime.operator.window.WindowFunctionFactory.construnctWindowFunction(WindowFunctionFactory.java:56)
    	at org.apache.pinot.query.runtime.operator.WindowAggregateOperator.<init>(WindowAggregateOperator.java:145)
    	at org.apache.pinot.query.runtime.plan.PhysicalPlanVisitor.visitWindow(PhysicalPlanVisitor.java:107)
    	at org.apache.pinot.query.runtime.plan.PhysicalPlanVisitor.visitWindow(PhysicalPlanVisitor.java:65)
    ...
    Caused by: java.lang.reflect.InvocationTargetException
    	at jdk.internal.reflect.GeneratedConstructorAccessor151.newInstance(Unknown Source)
    	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    ...
    Caused by: java.lang.UnsupportedOperationException: Cannot convert value from INTEGER to TIMESTAMP
    	at org.apache.pinot.common.utils.PinotDataType$5.toTimestamp(PinotDataType.java:300)
    	at org.apache.pinot.common.utils.PinotDataType$10.convert(PinotDataType.java:593)
    	at org.apache.pinot.common.utils.PinotDataType$10.convert(PinotDataType.java:545)
    	at org.apache.pinot.query.runtime.operator.window.value.LagValueWindowFunction.<init>(LagValueWindowFunction.java:63)
    org.apache.pinot.query.service.dispatch.QueryDispatcher.submit(QueryDispatcher.java:198)
    org.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:95)
    org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:219)
    org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:133)
    m
    g
    • 3
    • 9
  • m

    mg

    09/18/2025, 10:16 AM
    Hi Team, I am trying to perform a filesystem ingestion to gcs bucket using minion SegmentGenerationAndPushTask. The example in the docs (https://docs.pinot.apache.org/basics/concepts/components/cluster/minion#segmentgenerationandpushtask) describes the config when fetching files form s3:
    Copy code
    "ingestionConfig": {
        "batchIngestionConfig": {
          "batchConfigMaps": [
            {
              "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
              "input.fs.prop.region": "us-west-2",
              "input.fs.prop.secretKey": "....",
              "input.fs.prop.accessKey": "....",
              "inputDirURI": "<s3://my.s3.bucket/batch/airlineStats/rawdata/>",
              ...
    we have updated className to:
    org.apache.pinot.plugin.filesystem.GcsPinotFS
    , but we cannot fiure how to set the
    gcpKey
    instead of
    secretKey
    and
    accessKey
    properties. Probably we need to set gcp
    projectId
    as well.
    x
    • 2
    • 6
  • m

    mg

    09/23/2025, 9:38 AM
    Hi Team, Can I add a sidecar container to pinot controller and broker pods when deploying using the helm chart?
    x
    • 2
    • 6
  • n

    Nicolas

    09/24/2025, 2:46 PM
    Hi everyone, Would like to know if it's possible to configure a real-time table, consuming from 2 different Kafka clusters ?
    r
    q
    • 3
    • 14
  • m

    mg

    09/29/2025, 8:39 AM
    Hi all, The Pinot Controller UI showes all tables configurations including SSL configs. is it possible to hide or mask sensitive info from the UI such as kafka truststore and keystore passwords?
    Copy code
    ...,
        "tableIndexConfig": {
          "streamConfigs": {
            "security.protocol": "SSL",
            "ssl.truststore.location": "/opt/pinot/kafka-cert-jks/truststore.jks",
            "ssl.truststore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
            "ssl.truststore.type": "JKS",
            "ssl.keystore.location": "/opt/pinot/kafka-cert-jks/keystore.jks",
            "ssl.keystore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
            "ssl.keystore.type": "JKS",
            "ssl.key.password": "P6cz00RPASSWORDPLAINTEXT006OTF5"
  • s

    Sankaranarayanan Viswanathan

    09/29/2025, 5:57 PM
    Hello Everyone, wondering if I can get some guidance on something I am working on. I am storing events in a pinot table and we have a modified retention manager to delete segments based on the min and max values of an expiry date column on this table that is populated at ingestion time. Each event row in the pinot table is also associated with some external objects stored in S3 and we use the pinot table as source of truth. When a pinot segment goes out of retention we would like to delete those related objects in S3. Are there patterns on how to accomplish this?
    m
    r
    • 3
    • 8
  • b

    Brook E

    09/30/2025, 3:29 PM
    Does anyone have any good strategies for how they automatically toggle data from real-time to offline?
    r
    m
    m
    • 4
    • 14
  • m

    magax90515

    10/05/2025, 11:08 AM
    Will
    org.apache.pinot:pinot-common:1.4.0
    be published to maven?
    org.apache.pinot:pinot-java-client:1.4.0
    has been published, but it depends on pinot-common which has not been published.
  • y

    Yeshwanth

    10/07/2025, 7:10 AM
    Hi everyone, We are running a large-scale Pinot deployment and plan to store all our data in a single, large table. As our segment count grows into the hundreds of thousands, we are already hitting the ZooKeeper ZNode size limit (
    jute.maxbuffer
    ) due to the large segment metadata. We have reviewed the official troubleshooting documentation, which suggests two primary solutions: 1. Decrease the number of segments: We cannot use rollups or further merge segments, as our current segment size is already optimized at ~300MB, and we need to maintain data granularity for our query performance. 2. Increase `jute.maxbuffer`: We view this as a last resort, as we are concerned about the potential downstream performance impacts on the ZooKeeper cluster. Given these constraints, we have a few questions: • What are the recommended strategies for managing ZNode size in a table with a very high segment count, beyond the two options mentioned above? • Is there a practical or theoretical upper limit on the number of segments a single Pinot table can efficiently handle before ZK performance degrades? • Are there alternative configurations or architectural approaches we should consider for this scenario?
    m
    • 2
    • 1
  • g

    Gerald Bonfiglio

    10/07/2025, 6:59 PM
    Hey everyone, We want to use the JDBC Grpc Client that was introduced in 1.4.0, but getting error building from maven:
    Copy code
    Failed to collect dependencies at org.apache.pinot:pinot-jdbc-client:jar:1.4.0:
    Failed to read artifact descriptor for org.apache.pinot:pinot-jdbc-client:jar:1.4.0: The following artifacts could not be resolved: org.apache.pinot:pinot:pom:1.4.0 (absent)
    Checking in Maven Central, pinot-1.4.0 doesn't seem to be there. Are their plans for pushing the remaining 1.4.0 jars to Maven Central? Are we missing something else?
    y
    q
    +2
    • 5
    • 19
  • r

    robert zych

    10/09/2025, 4:02 PM
    https://www.meetup.com/apache-pinot/events/311444779/
    apache pinot crimson 1
    👍 1
  • m

    mg

    10/10/2025, 8:13 AM
    Hi everyone, Any plans to move away from Bitnami zookeeper as a dependency in Pinot Helm chart?
    y
    • 2
    • 3
  • s

    Shubham Kumar

    10/10/2025, 9:59 AM
    Hi team, My current primary key count is around 100 million. Whenever I restart the server, the primary key count increases to around 260 million and then drops back to 100 million. Could you please help me understand why this behavior occurs?
    m
    k
    • 3
    • 21
  • a

    Arnav

    10/13/2025, 9:04 AM
    Hi team, TOTAL_KEYS_MARKED_FOR_DELETION is a meter metric, means it will reset when server restarts and also its _Count gives cumulative value. So is there any way to get the exact keys marked for deletion for a time frame like last 2hrs or so?
  • r

    RANJITH KUMAR

    10/13/2025, 3:26 PM
    Hi Team, What is API that we can use to get all tasks running associated to OFFLINE Table configured with minion job tasks. I am able to get the task list and able to get config details for task name but how can we do it with OFFLINE TABLE Name how to get task name list ? Also even after deleting the OFFLINE Table from UI pinot controller task segments are running in background and also not even able to stop and delete them , facing these errors Method Not Allowed and Server error '500 Internal Server Error' for url 'http://pinot-controller:9000/tasks/task/Task_SegmentGenerationAndPushTask_199db571-e077-4d74-86e6-f380de37ea51_1760096118944' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500 respectively.
    m
    s
    • 3
    • 5
  • r

    RANJITH KUMAR

    10/14/2025, 1:58 PM
    Hi Team, Need general recommendation how to make sure minion tasks gets completed quickly ideally within 15 mins lets assumes we have 10x GBs of data for 20+ tables. What is hardware we need to scale to make minion tasks complete fastly?
    m
    m
    • 3
    • 2
  • p

    Paulc

    10/21/2025, 8:57 AM
    a
  • s

    Shubham Kumar

    10/22/2025, 4:19 AM
    Hi Team, Could you please share a sample logical table configuration? I’m currently using the configuration below for my logical table, but it only fetches data from the realtime table the offline table data is not being fetched.
    Copy code
    {
      "tableName": "logicalTable",
      "physicalTableConfigMap": {
        "user_stream_REALTIME": {},
        "user_batch_OFFLINE": {}
      },
      "refOfflineTableName": "user_batch_OFFLINE",
      "refRealtimeTableName": "user_stream_REALTIME",
      "brokerTenant": "DefaultTenant",
      "timeBoundaryConfig": {
        "boundaryStrategy": "min",
        "parameters": {
          "function": "min"
        }
      }
    }
    m
    • 2
    • 14
  • x

    Xiang Fu

    10/23/2025, 12:13 AM
    The next Pinot contributors call will happen tomorrow 8:30AM PDT.
    🚀 2
  • a

    Arnav

    10/28/2025, 9:46 AM
    Hi team, Anyone can please explain me, Query 1 taking around 120 secs whereas Query 2 taking 15-20 secs to give same result. Total Docs in table 6Billion and my table is RT table Is it because in Query2, all 3 queries are computed parallely? or in first query segments are loaded to memory and then other two query takes very less time hence overall less query time Query 1:
    Copy code
    SELECT * FROM table
      WHERE customer_id = 1234
        AND msisdn IN ( ..1000 msisdns)
    Query 2:
    Copy code
    SELECT * FROM table
      WHERE customer_id = 1234
        AND msisdn IN ( ..350 msisdns)
      UNION ALL
      SELECT * FROM append_iot_session_events
      WHERE customer_id = 1234
        AND msisdn IN (..350 msisdns)
      UNION ALL
      SELECT * FROM append_iot_session_events
      WHERE customer_id = 1234
        AND msisdn IN (..300 msisdns)
    m
    g
    y
    • 4
    • 29
  • r

    robert zych

    10/29/2025, 2:44 PM
    The next Pinot Contributor call is scheduled for next Tuesday 8:30AM pacific https://www.meetup.com/apache-pinot/events/311759314/?slug=apache-pinot&eventId=311759314 Slack Conversation
  • m

    Matt Nawara

    10/30/2025, 12:26 PM
    Hi all, we have a usecase where • we have a table with a metric that is sourced from an ingestion aggregation • we know we will have to add columns to it relatively dynamically (user request) unfortunately up until now I had not registered this requirement from the documentation:
    All metrics must have aggregation configs.
    I feel like it is at the heart of what we are seeing now; in essence, you can't update the schema with a new metric, as the API says:
    Copy code
    PUT schema response: {'code': 400, 'error': 'Invalid schema: staging_stream_st_mknaw_idle_worker_test14_sg_12. Reason: Schema is incompatible with tableConfig with name: staging_stream_st_mknaw_idle_worker_test14_sg_12_REALTIME and type: REALTIME'}
    and, probably correctly, the other way around, trying to get the table update in before the schema update, also does not work
    Copy code
    PUT table response: {'code': 400, 'error': "Invalid table config: staging_stream_st_mknaw_idle_worker_test14_sg_12 with error: The destination column 'mtr_clicks_sum' of the aggregation function must be present in the schema"}
    so... is the implication that a pinot schema/table pair that has ingestion aggregation can.. never evolve? this would be unfortunate.
  • g

    Gerald Bonfiglio

    10/30/2025, 5:08 PM
    Hi Everyone, We have a use case where we want to write a Java Map object onto a pinot table. We have tried both writing it as JSON and flattening the map, creating a separate column for each map key. Using tables with a few million rows, when testing query performance, we noticed that using separate columns is much more performant than using a JSON column for the map, so we are proceeding with flatting out the map. As you would expect, since the map key maps to a specific table column which is of a specific data type, all values for the same key in different records have to be the same data type. However, we do have situation where the same key can be one of several data types. Was wondering if anyone else had a similar use case, and if they found a solution that works. One solution that comes to mind is to create columns based on both the key and the data type, so that if they key appeared as both a string and a long, there would be 2 columns, s_key and l_key. Seems pretty straight forward, but it complicates queries, in that we need to know what columns we have created and query against all of them (could be more than just these 2).
    m
    • 2
    • 3