https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • v

    Vatsal Agrawal

    08/29/2025, 5:43 AM
    Hi Team, We are facing an issue with MergeRollupTask in our Pinot cluster. After the task runs, the original segments are not getting deleted, and we end up with both the original and the merged segments in the table. Retention properties: left as default. Any guidance on what we might be missing would be super helpful. Adding task, table and segments related details in the thread.
    r
    • 2
    • 11
  • a

    Arnav

    08/29/2025, 5:52 PM
    Hi team, is there way to parse below kafka event and ingest to RT pinot ?
    Copy code
    {
      "start_time_new": {
        "long": 1756489188000
      },
      "event_time_new": {
        "long": 1756489188000
      }
    }
    i tried below configuration but it's not parsing
    Copy code
    "ingestionConfig": {
        "transformConfigs": [
          {
            "columnName": "start_time_new",
            "transformFunction": "jsonPathLong(__raw__start_time_new, '$.long', 0)"
          },
          {
            "columnName": "event_time_new",
            "transformFunction": "jsonPathLong(__raw__event_time_new, '$.long', 0)"
          }
        ],
        "continueOnError": false,
        "rowTimeValueCheck": false,
        "segmentTimeValueCheck": true
      }
    r
    • 2
    • 1
  • r

    Rajkumar

    08/30/2025, 6:23 PM
    Hi All, Very impressed with what Apache Pinot can do, and I am considering Pinot for a critical use case, and we are not Java experts in our team - Would Java be a key skill to adopt Pinot successfully? An additional question, will join between two realtime tables work? Information online seem to suggest, that join between two realtime tables are not recommended for Production, just checking if anyone here has experiences around this - thanks.
    m
    • 2
    • 13
  • a

    Arnav

    09/01/2025, 7:07 AM
    Hi team, I enabled "stream.kafka.metadata.populate": "true", to get below fields. These fields i have added in schema also. __key __metadata$offset __metadata$partition __metadata$recordTimestamp But on querying table __metadata$offset __metadata$partition __metadata$recordTimestamp these are populated properly but __key is coming as blank. Since my kafka event and key are avro encoded. I used following config:
    Copy code
    "stream.kafka.decoder.prop.format": "AVRO",
    "stream.kafka.decoder.prop.schema.registry.schema.name": "schema-name",
    "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
    "stream.kafka.decoder.prop.schema.registry.rest.url": "schema-url",
    "stream.kafka.decoder.prop.key.format": "AVRO",
    "stream.kafka.decoder.prop.key.schema.registry.schema.name": "schema-name-key",
    "stream.kafka.decoder.prop.key.schema.registry.rest.url": "schema-url",
    data is also properly deserialised. Only __key is blank. My guess is that below configs i added is not able to deserialise it. Is there any other way to deserialise the key?
    Copy code
    "stream.kafka.decoder.prop.key.format": "AVRO",
    "stream.kafka.decoder.prop.key.schema.registry.schema.name": "schema-name-key",
    "stream.kafka.decoder.prop.key.schema.registry.rest.url": "schema-url",
  • a

    Abdulaziz Alqahtani

    09/01/2025, 7:17 PM
    Hi team, we have a multi-tenant hybrid table where each row has a
    tenant_id
    (ULID). The column is low cardinality, and most queries include a
    tenant_id
    predicate. What’s the best way to index this column?
    m
    • 2
    • 12
  • c

    cesho

    09/04/2025, 2:16 PM
    Can someone explain how Apache Pinot integrates with Confluent Schema Registry during Kafka stream ingestion? Specifically: 1. Does Pinot use Schema Registry only for deserialization of Avro/Protobuf messages, or can it automatically generate Pinot table schemas from the registered schemas? 2. If auto-generation is supported, what are the limitations or required configurations? 3. How does Pinot handle schema evolution in Schema Registry (e.g., backward/forward compatibility) during ingestion? 4. Are there any best practices for defining Pinot schemas when using Schema Registry to avoid data type mismatches? Context: I’m setting up real-time ingestion from Kafka topics with Avro schemas stored in Schema Registry and want to minimize manual schema mapping work.
    m
    m
    • 3
    • 3
  • a

    Abdulaziz Alqahtani

    09/07/2025, 8:34 PM
    Hi team, What’s the recommended approach for one-off batch ingestion of data from S3 into Pinot, Minion-based ingestion vs standalone ingestion? For context: • I currently have a real-time table. • I want to import historical data into a separate offline table. • My source data is in PostgreSQL, and I can export and chunk it into S3 first.
    m
    • 2
    • 1
  • m

    mg

    09/08/2025, 8:09 PM
    Hi Team, I'm running into an issue with the Pinot Controller UI and the Swagger REST API when using an NGINX Ingress with a subpath. I'm hoping someone has encountered this and can help. Here goes the problem summary: I've configured my ingress to expose the Pinot Controller at
    <https://example.com/pinot/>
    . The main UI works fine and most links are correctly routed. Those that works open on
    <https://example.com/pinot/#/>...
    However, the Swagger REST API UI link is not. Swagger API button, it tries to access
    <https://example.com/help>
    instead of
    <https://example.com/pinot/help>
    , resulting in a 404 Not Found error. I don't see an obvious way to enforce the swagger link subpath to something other than (/) ? I am using helm, and I have been looking for different options in https://github.com/apache/pinot/blob/master/helm/pinot/README.md but nothing worked.. thanks in advance..
    m
    x
    • 3
    • 3
  • s

    Soon

    09/11/2025, 5:19 PM
    Hello team! I had a quick question if query plan shows
    FILTER_SORTED_INDEX
    would it be the same as using
    FILTER_INVERTED_INDEX
    like sorted inverted index?
    m
    r
    x
    • 4
    • 6
  • i

    Indira Vashisth

    09/15/2025, 9:57 AM
    Hi team, i triggered server rebalance in my pinot cluster with 3 servers, but the segment reassignment shows the target server for all the segments as only one server. How can i make it assign the data to all 3 servers.
    m
    y
    • 3
    • 4
  • i

    Indira Vashisth

    09/15/2025, 10:02 AM
    Also what is the recommended size of data we should be storing per server? We will need to store more than 150TB of data and hit this data with complex queries including distinct, json match and sorting.
  • t

    Trust Okoroego

    09/17/2025, 4:49 PM
    Possible Bug in Pinot LAG window function. (1.2.0)
    Copy code
    select
    	  ORDER_ID,
    	  ORDER_NUMBER,
    	  CUSTORDER_ID,
    	  ORDER_VALIDATION_CODE,
    	  POD_CODE,
    	  DELIVERY_FROM_DAT,
    	  DELIVERY_TO_DAT,
    	  CTL_CRE_TS,
    	  CTL_MOD_TS,
    	  ORDER_STATUS_CD,
    	  SAREA_ID,
    	    LAG(ON_HOLD_ORDER_AND_LOCKED_FLAG, 1, 0) OVER (PARTITION BY ORDER_ID ORDER BY CTL_MOD_TS) AS prev_is_active
    		from
    		Orders
      )
    If default is not set, the result return correctly with the last row returning a NULL for prev_is_active since no row before it. However setting the default of 0 throws an unrelated timestamp error. Could this be related to NULL handling?
    Copy code
    at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
    	at org.apache.pinot.query.service.server.QueryServer.submit(QueryServer.java:156)
    	at org.apache.pinot.common.proto.PinotQueryWorkerGrpc$MethodHandlers.invoke(PinotQueryWorkerGrpc.java:284)
    ...
    Caused by: java.lang.RuntimeException: Caught exception while submitting request: 1473823763000000159, stage: 2
    	at org.apache.pinot.query.service.server.QueryServer.lambda$submit$1(QueryServer.java:144)
    	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
    	... 3 more
    Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed to instantiate WindowFunction for function: LAG
    	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
    	at org.apache.pinot.query.service.server.QueryServer.lambda$submit$1(QueryServer.java:141)
    	... 4 more
    ...
    Caused by: java.lang.RuntimeException: Failed to instantiate WindowFunction for function: LAG
    	at org.apache.pinot.query.runtime.operator.window.WindowFunctionFactory.construnctWindowFunction(WindowFunctionFactory.java:56)
    	at org.apache.pinot.query.runtime.operator.WindowAggregateOperator.<init>(WindowAggregateOperator.java:145)
    	at org.apache.pinot.query.runtime.plan.PhysicalPlanVisitor.visitWindow(PhysicalPlanVisitor.java:107)
    	at org.apache.pinot.query.runtime.plan.PhysicalPlanVisitor.visitWindow(PhysicalPlanVisitor.java:65)
    ...
    Caused by: java.lang.reflect.InvocationTargetException
    	at jdk.internal.reflect.GeneratedConstructorAccessor151.newInstance(Unknown Source)
    	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    ...
    Caused by: java.lang.UnsupportedOperationException: Cannot convert value from INTEGER to TIMESTAMP
    	at org.apache.pinot.common.utils.PinotDataType$5.toTimestamp(PinotDataType.java:300)
    	at org.apache.pinot.common.utils.PinotDataType$10.convert(PinotDataType.java:593)
    	at org.apache.pinot.common.utils.PinotDataType$10.convert(PinotDataType.java:545)
    	at org.apache.pinot.query.runtime.operator.window.value.LagValueWindowFunction.<init>(LagValueWindowFunction.java:63)
    org.apache.pinot.query.service.dispatch.QueryDispatcher.submit(QueryDispatcher.java:198)
    org.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:95)
    org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:219)
    org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:133)
    m
    g
    • 3
    • 9
  • m

    mg

    09/18/2025, 10:16 AM
    Hi Team, I am trying to perform a filesystem ingestion to gcs bucket using minion SegmentGenerationAndPushTask. The example in the docs (https://docs.pinot.apache.org/basics/concepts/components/cluster/minion#segmentgenerationandpushtask) describes the config when fetching files form s3:
    Copy code
    "ingestionConfig": {
        "batchIngestionConfig": {
          "batchConfigMaps": [
            {
              "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
              "input.fs.prop.region": "us-west-2",
              "input.fs.prop.secretKey": "....",
              "input.fs.prop.accessKey": "....",
              "inputDirURI": "<s3://my.s3.bucket/batch/airlineStats/rawdata/>",
              ...
    we have updated className to:
    org.apache.pinot.plugin.filesystem.GcsPinotFS
    , but we cannot fiure how to set the
    gcpKey
    instead of
    secretKey
    and
    accessKey
    properties. Probably we need to set gcp
    projectId
    as well.
    x
    • 2
    • 6
  • m

    mg

    09/23/2025, 9:38 AM
    Hi Team, Can I add a sidecar container to pinot controller and broker pods when deploying using the helm chart?
    x
    • 2
    • 6
  • n

    Nicolas

    09/24/2025, 2:46 PM
    Hi everyone, Would like to know if it's possible to configure a real-time table, consuming from 2 different Kafka clusters ?
    r
    q
    • 3
    • 14
  • m

    mg

    09/29/2025, 8:39 AM
    Hi all, The Pinot Controller UI showes all tables configurations including SSL configs. is it possible to hide or mask sensitive info from the UI such as kafka truststore and keystore passwords?
    Copy code
    ...,
        "tableIndexConfig": {
          "streamConfigs": {
            "security.protocol": "SSL",
            "ssl.truststore.location": "/opt/pinot/kafka-cert-jks/truststore.jks",
            "ssl.truststore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
            "ssl.truststore.type": "JKS",
            "ssl.keystore.location": "/opt/pinot/kafka-cert-jks/keystore.jks",
            "ssl.keystore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
            "ssl.keystore.type": "JKS",
            "ssl.key.password": "P6cz00RPASSWORDPLAINTEXT006OTF5"
  • s

    Sankaranarayanan Viswanathan

    09/29/2025, 5:57 PM
    Hello Everyone, wondering if I can get some guidance on something I am working on. I am storing events in a pinot table and we have a modified retention manager to delete segments based on the min and max values of an expiry date column on this table that is populated at ingestion time. Each event row in the pinot table is also associated with some external objects stored in S3 and we use the pinot table as source of truth. When a pinot segment goes out of retention we would like to delete those related objects in S3. Are there patterns on how to accomplish this?
    m
    r
    • 3
    • 8
  • b

    Brook E

    09/30/2025, 3:29 PM
    Does anyone have any good strategies for how they automatically toggle data from real-time to offline?
    r
    m
    m
    • 4
    • 14
  • m

    magax90515

    10/05/2025, 11:08 AM
    Will
    org.apache.pinot:pinot-common:1.4.0
    be published to maven?
    org.apache.pinot:pinot-java-client:1.4.0
    has been published, but it depends on pinot-common which has not been published.
  • y

    Yeshwanth

    10/07/2025, 7:10 AM
    Hi everyone, We are running a large-scale Pinot deployment and plan to store all our data in a single, large table. As our segment count grows into the hundreds of thousands, we are already hitting the ZooKeeper ZNode size limit (
    jute.maxbuffer
    ) due to the large segment metadata. We have reviewed the official troubleshooting documentation, which suggests two primary solutions: 1. Decrease the number of segments: We cannot use rollups or further merge segments, as our current segment size is already optimized at ~300MB, and we need to maintain data granularity for our query performance. 2. Increase `jute.maxbuffer`: We view this as a last resort, as we are concerned about the potential downstream performance impacts on the ZooKeeper cluster. Given these constraints, we have a few questions: • What are the recommended strategies for managing ZNode size in a table with a very high segment count, beyond the two options mentioned above? • Is there a practical or theoretical upper limit on the number of segments a single Pinot table can efficiently handle before ZK performance degrades? • Are there alternative configurations or architectural approaches we should consider for this scenario?
    m
    • 2
    • 1
  • g

    Gerald Bonfiglio

    10/07/2025, 6:59 PM
    Hey everyone, We want to use the JDBC Grpc Client that was introduced in 1.4.0, but getting error building from maven:
    Copy code
    Failed to collect dependencies at org.apache.pinot:pinot-jdbc-client:jar:1.4.0:
    Failed to read artifact descriptor for org.apache.pinot:pinot-jdbc-client:jar:1.4.0: The following artifacts could not be resolved: org.apache.pinot:pinot:pom:1.4.0 (absent)
    Checking in Maven Central, pinot-1.4.0 doesn't seem to be there. Are their plans for pushing the remaining 1.4.0 jars to Maven Central? Are we missing something else?
    y
    q
    +2
    • 5
    • 19
  • r

    robert zych

    10/09/2025, 4:02 PM
    https://www.meetup.com/apache-pinot/events/311444779/
    apache pinot crimson 1
    👍 1
  • m

    mg

    10/10/2025, 8:13 AM
    Hi everyone, Any plans to move away from Bitnami zookeeper as a dependency in Pinot Helm chart?
    y
    • 2
    • 3
  • s

    Shubham Kumar

    10/10/2025, 9:59 AM
    Hi team, My current primary key count is around 100 million. Whenever I restart the server, the primary key count increases to around 260 million and then drops back to 100 million. Could you please help me understand why this behavior occurs?
    m
    k
    • 3
    • 21
  • a

    Arnav

    10/13/2025, 9:04 AM
    Hi team, TOTAL_KEYS_MARKED_FOR_DELETION is a meter metric, means it will reset when server restarts and also its _Count gives cumulative value. So is there any way to get the exact keys marked for deletion for a time frame like last 2hrs or so?
  • r

    RANJITH KUMAR

    10/13/2025, 3:26 PM
    Hi Team, What is API that we can use to get all tasks running associated to OFFLINE Table configured with minion job tasks. I am able to get the task list and able to get config details for task name but how can we do it with OFFLINE TABLE Name how to get task name list ? Also even after deleting the OFFLINE Table from UI pinot controller task segments are running in background and also not even able to stop and delete them , facing these errors Method Not Allowed and Server error '500 Internal Server Error' for url 'http://pinot-controller:9000/tasks/task/Task_SegmentGenerationAndPushTask_199db571-e077-4d74-86e6-f380de37ea51_1760096118944' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500 respectively.
    m
    s
    • 3
    • 5
  • r

    RANJITH KUMAR

    10/14/2025, 1:58 PM
    Hi Team, Need general recommendation how to make sure minion tasks gets completed quickly ideally within 15 mins lets assumes we have 10x GBs of data for 20+ tables. What is hardware we need to scale to make minion tasks complete fastly?
    m
    m
    • 3
    • 2
  • p

    Paulc

    10/21/2025, 8:57 AM
    a
  • s

    Shubham Kumar

    10/22/2025, 4:19 AM
    Hi Team, Could you please share a sample logical table configuration? I’m currently using the configuration below for my logical table, but it only fetches data from the realtime table the offline table data is not being fetched.
    Copy code
    {
      "tableName": "logicalTable",
      "physicalTableConfigMap": {
        "user_stream_REALTIME": {},
        "user_batch_OFFLINE": {}
      },
      "refOfflineTableName": "user_batch_OFFLINE",
      "refRealtimeTableName": "user_stream_REALTIME",
      "brokerTenant": "DefaultTenant",
      "timeBoundaryConfig": {
        "boundaryStrategy": "min",
        "parameters": {
          "function": "min"
        }
      }
    }
    m
    • 2
    • 14
  • x

    Xiang Fu

    10/23/2025, 12:13 AM
    The next Pinot contributors call will happen tomorrow 8:30AM PDT.
    🚀 2