https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • k

    Kartik Khare

    07/24/2025, 11:56 AM
    tableau
  • s

    San Kumar

    07/25/2025, 3:37 AM
    Hello Is any one face this below exception when upload segment to offline table in my test environment we are getting below error when upload segment to offline table {"log_timestamp": "2025-07-25T031932.383+0000", "log_level": "ERROR", "process_id": 1, "process_name": "pinot-controller", "thread_id": 1, "thread_name": "jersey-server-managed-async-executor-0", "action_name": "org.apache.pinot.controller.api.upload.ZKOperator", "log_message": "Caught exception while calling assignTableSegment for adding segment: alarms_OFFLINE_1753394400000_1753394400000_0 to table: alarms_OFFLINE java.lang.RuntimeException: Caught exception while updating ideal state for resource: alarms_OFFLINE| at org.apache.pinot.common.utils.helix.IdealStateGroupCommit.updateIdealState(IdealStateGroupCommit.java:312)| at org.apache.pinot.common.utils.helix.IdealStateGroupCommit.commit(IdealStateGroupCommit.java:124)| at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:83)| at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.assignTableSegment(PinotHelixResourceManager.java:2380)| at org.apache.pinot.controller.api.upload.ZKOperator.processNewSegment(ZKOperator.java:547)| at org.apache.pinot.controller.api.upload.ZKOperator.completeSegmentOperations(ZKOperator.java:98)| at org.apache.pinot.controller.api.resources.PinotSegmentUploadDownloadRestletResource.uploadSegment(PinotSegmentUploadDownloadRestletResource.java:412)| at org.apache.pinot.controller.api.resources.PinotSegmentUploadDownloadRestletResource.uploadSegmentAsMultiPartV2(PinotSegmentUploadDownloadRestletResource.java:862)| at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)| at java.base/java.lang.reflect.Method.invoke(Method.java:580)| at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)| at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)| at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)| at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)| at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)| at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)| at org.glassfish.jersey.server.model.ResourceMethodInvoker.lambda$apply$0(ResourceMethodInvoker.java:390)| at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2$1.run(ServerRuntime.java:830)| at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)| at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)| at org.glassfish.jersey.internal.Errors.process(Errors.java:292)| at org.glassfish.jersey.internal.Errors.process(Errors.java:274)| at org.glassfish.jersey.internal.Errors.process(Errors.java:244)| at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)| at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2.run(ServerRuntime.java:825)| at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)| at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)| at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)| at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)| at java.base/java.lang.Thread.run(Thread.java:1583)|Caused by: org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 20 attempts| at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:65)| at org.apache.pinot.common.utils.helix.IdealStateGroupCommit.updateIdealState(IdealStateGroupCommit.java:208)| ... 29 more"}
  • v

    Veerendra

    07/25/2025, 9:10 AM
    Hello everyone, I was recently reading Uber's article on "Operating Apache Pinot" where they discussed their implementation of a Pinot REST Proxy for load balancing. I'm curious to know if there are any plans within the Apache Pinot community to integrate a native load balancer directly into Pinot itself, or if the recommended approach will continue to be an external solution like the REST Proxy. Is Uber's Pinot REST Proxy, or a similar community-contributed REST Proxy solution, open-sourced and available for wider use? Any insights or discussions on this topic would be greatly appreciated! Article: https://www.uber.com/en-IN/blog/operating-apache-pinot/
    m
    • 2
    • 3
  • e

    Evan Galpin

    07/25/2025, 5:41 PM
    @robert zych @Elon you both use and/or develop with Trino + Pinot, is that right? I’m playing around with that and for ease of getting started, I’m trying to connect Trino to a pinot cluster that has a self-signed certificate. Do either of you have advice on how to have trino ignore insecure certs/skip validation for pinot catalogs?
    r
    • 2
    • 6
  • s

    San Kumar

    07/27/2025, 11:52 AM
    Hello Team what is the max size of segment size i can choose for offiline table, we are using
    Copy code
    executionFrameworkSpec:
      name: 'standalone'
    what is the maximum csv file size it can upload to pinot offline table. can we upload 100gb file to offline table using Standaloone mode
  • j

    Jacek Skrzypacz

    07/28/2025, 12:26 PM
    Hello team, I have a question about Kafka and Pinot integration. Is there a way to store all Kafka partitions within a single Pinot segment? Also, could you explain the reasoning behind the current setup where it seems to be tied to: 1 Kafka partition -> 1 Pinot consumer -> 1 segment? What are the implications or benefits of this design? Thanks!
    m
    • 2
    • 3
  • m

    Muller Liu

    07/28/2025, 5:25 PM
    Hi team we want to add a doc for adding prefix, suffix & ngram UDFs, how can we get approval from community for pinot-docs repo? CC: @Qiaochu Liu https://github.com/pinot-contrib/pinot-docs/pull/438
    m
    • 2
    • 1
  • s

    San Kumar

    07/29/2025, 5:46 AM
    Hello Team Hello i pushed message to offline table and reading from a abc source table. in ABC table retention period is 1 hour. Here we are recreating segment file every 15 minutes with nearest order15MINHOUR and make segmenttimestamp as order15MINHOUR.so all order hour data is replacing and have latest status of that order. now order ststus came after 2 hours in source table and we have only now one data..so when we recompute the data upload the data its has only 1 record . in this case we are planning to either append that single record when updatetimestamp is greter than 1 hour or replace that record some how.. how can I replace that segment and know the segment information for that order
    m
    • 2
    • 11
  • s

    Shubham Kumar

    07/30/2025, 11:49 AM
    Hello Team, I ingested around 10K rows into a real-time table from Kafka. After that, I marked
    is_deleted
    (configured as
    deleteRecordColumn
    ) as
    true
    for soft deletion of 5K rows. However, after the
    deleteRecordTTL
    period, the primary keys are not being deleted. I can see that the documents are unqueryable, but they are still present in the primary key map. Is there a scheduler that handles this cleanup? Does it run automatically, or do we need to trigger it manually? Can we configure the frequency at which it runs?
    m
    s
    • 3
    • 8
  • u

    박민지

    07/30/2025, 11:55 AM
    Hello Team! I tried to ingest the data from multiple kafka topics to single table. Is this supported in Confluent kafka? I got below error when I followed this document:
    Copy code
    2025/07/30 13:27:36.981 ERROR [RealtimeSegmentDataManager_order_events_test__10001__0__20250730T1123Z] [order_events_test__10001__0__20250730T1123Z] Exception while in work
    org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
    	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:823) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:665) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:646) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:626) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.lambda$createConsumer$0(KafkaPartitionLevelConnectionHandler.java:86) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.retry(KafkaPartitionLevelConnectionHandler.java:100) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.createConsumer(KafkaPartitionLevelConnectionHandler.java:86) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.<init>(KafkaPartitionLevelConnectionHandler.java:67) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConsumer.<init>(KafkaPartitionLevelConsumer.java:52) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory.createPartitionGroupConsumer(KafkaConsumerFactory.java:51) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.recreateStreamConsumer(RealtimeSegmentDataManager.java:1830) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.consumeLoop(RealtimeSegmentDataManager.java:529) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager$PartitionConsumer.run(RealtimeSegmentDataManager.java:765) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
    Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
    	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:89) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:48) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:731) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
    	... 13 more
    Since I was able to ingest data from the first topic, I don't think the broker URL is the issue. Is there anything I might be missing?? I set table config like this:
    Copy code
    "streamConfigMaps": [
      {
        "streamType": "kafka",
        "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": "{BROKER}:9092",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.protobuf.KafkaConfluentSchemaRegistryProtoBufMessageDecoder",
        "stream.kafka.decoder.prop.schema.registry.rest.url": "https://{REGISTRY}}",
        "stream.kafka.decoder.prop.basic.auth.credentials.source": "USER_INFO",
        "<http://stream.kafka.decoder.prop.schema.registry.basic.auth.user.info|stream.kafka.decoder.prop.schema.registry.basic.auth.user.info>": "{USER_INFO}",
        "stream.kafka.consumer.type": "LOWLEVEL",
        "security.protocol": "SASL_SSL",
        "sasl.mechanism": "PLAIN",
        "sasl.jaas.config": "{SASL}",
        "realtime.segment.flush.threshold.rows": "500000",
        "realtime.segment.flush.autotune.initialRows": "500000",
        "stream.kafka.topic.name": "{TOPIC_NAME1}"
      },
      {
        "streamType": "kafka",
        "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": "{BROKER}:9092",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.protobuf.KafkaConfluentSchemaRegistryProtoBufMessageDecoder",
        "stream.kafka.decoder.prop.schema.registry.rest.url": "https://{REGISTRY}}",
        "stream.kafka.decoder.prop.basic.auth.credentials.source": "USER_INFO",
        "<http://stream.kafka.decoder.prop.schema.registry.basic.auth.user.info|stream.kafka.decoder.prop.schema.registry.basic.auth.user.info>": "{USER_INFO}",
        "stream.kafka.consumer.type": "LOWLEVEL",
        "security.protocol": "SASL_SSL",
        "sasl.mechanism": "PLAIN",
        "sasl.jaas.config": "{SASL}",
        "realtime.segment.flush.threshold.rows": "500000",
        "realtime.segment.flush.autotune.initialRows": "500000",
        "stream.kafka.topic.name": "{TOPIC_NAME2}"
      }
    ],
    m
    c
    +2
    • 5
    • 8
  • u

    박민지

    07/30/2025, 3:06 PM
    Hi team, I have a quick question. Is there a way to send query results from pinot to a kafka topic, maybe using plugins? Could minions be used for this?
    m
    • 2
    • 10
  • r

    Raghavendra M

    07/31/2025, 4:46 AM
    Hi Team, anyone have done pinot migration from one cluster A to other cluster B. Do we have doc's for this migration. Basically we want to migrate hdfs data from one cluster to other cluster and make segments available on target cluster for querying.
  • a

    Aman Satya

    07/31/2025, 9:46 AM
    Hi team, Is it possible to introduce or configure tenants after the Pinot cluster has already been deployed? Also, since I’m deploying Pinot using Kubernetes with Helm, can I directly upgrade the cluster using Helm to add these tenants?
    m
    • 2
    • 1
  • s

    Shubham Kumar

    07/31/2025, 5:35 PM
    Hi Team, Could you please advise how I can convert the following files into a human-readable format, such as
    .txt
    ? •
    columns.psf
    •
    creation.meta
    •
    validdocids.bitmap.snapshot
    •
    ttl.watermark.partition.0
    Additionally, I would appreciate it if you could explain the purpose of each of these files
    m
    • 2
    • 8
  • s

    Shivam Sharma

    08/01/2025, 11:12 AM
    Hi team, Where can I find the release notes of Apache Pinot to get the details of the features added in new versions? And does the latest docker image references to latest stable version of Pinot? Is 1.3.0 is the latest stable version? CC: @Mayank @Xiang Fu
    m
    x
    • 3
    • 12
  • s

    San Kumar

    08/02/2025, 3:43 AM
    Hello Can Join works well in pinot.We have small table of few rows.
    m
    • 2
    • 1
  • x

    Xiang Fu

    08/05/2025, 11:19 AM
    🚨 Reminder: [Apache Pinot Contributor Call #3] is happening today! 📅 Date: August 5, 2025 ⏰ Time: 8:30 AM – 9:30 AM PDT (11:30 AM – 12:30 PM EDT / 3:30 PM – 4:30 PM UTC/ 9:00 PM - 9:30 PM IST) 👉 New Zoom Link (updated!): https://startreedata.zoom.us/j/89751791664?pwd=FpqfyztyKmf8TUPa4C8WhbsNYXGYHV.1&amp;jst=2 🧭 Agenda: • 8:30 AM: Graceful Node Replacement — @X G • 9:00 AM: Timeseries Engine GA: New Features & Roadmap — @Shaurya Chaturvedi ⏱️ Please join promptly at 8:30 AM PDT. We’ll record the session and share it afterward in Slack. See you there! Slack Conversation
  • s

    San Kumar

    08/05/2025, 4:22 PM
    Hello team What is the tuning parameter required to upload the segment to offline pinot table. As I see pinot do tar.gz before uploading the file.After gz our file is around 8gb .Can anyone suggest how to handle upload quickly
    m
    • 2
    • 1
  • x

    Xiang Fu

    08/05/2025, 5:31 PM
    🎬 Apache Pinot Contributor Call #3 Recording is Here! 🟢 Stream the full session on YouTube:

    https://youtu.be/YniO1cXJEas▾

    🗓️ Recorded on August 5, 2025 → Featuring: • Graceful Node Replacement by Xin Gao • Timeseries Engine GA: Features & Roadmap by Shaurya Chaturvedi Why watch? 🚀 – Learn automation best practices for real-time Pinot clusters – Preview new features and roadmap direction – Hear from active contributors shaping the future of Pinot Learn more about Apache Pinot—a real-time, high‑throughput OLAP datastore powering companies like LinkedIn and Uber. Join our Slack community (5K+ members!) to ask questions, share feedback, or get involved. ✅ Watch now, join the conversation, and stay tuned for future calls! Slack Conversation
  • m

    Mohemmad Zaid

    08/06/2025, 6:30 AM
    Why can't we use multi-value column in funciton_column pair of Startree Index? I understand limitation of using multi-value column in split order but we should be able to use it aggregation. Eg ->
    spaces
    is multi value column.
    Copy code
    {
      "dimensionsSplitOrder": [
        "pdate"
      ],
      "functionColumnPairs": [
        "DISTINCTCOUNTHLLMV__spaces"
      ]
    }
    https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java#L1309 IMO, we can avoid this check for aggregation column.
    k
    • 2
    • 2
  • r

    Raghavendra M

    08/06/2025, 7:52 AM
    Hello Team, Do we have any class for gzip record reader Record Reader Spec? https://docs.pinot.apache.org/configuration-reference/job-specification from above doc i don't see gzip record reader. Am trying to read gzip and push to pinot using batch job. @Mayank any idea on this?
    x
    • 2
    • 1
  • s

    Shubham Kumar

    08/07/2025, 9:33 AM
    Hello team, In batch ingestion, i want to add data paritioning. so, according to pinot doc in table index config segmentPartitionConfig adding this property will be enough or during ingesting data to pinot in segment creation i have to make segments according to segmentPartitionConfig and then partitioning will work? https://docs.pinot.apache.org/configuration-reference/table#table-index-config
    m
    • 2
    • 1
  • z

    Zaeem Arshad

    08/08/2025, 1:01 PM
    Hi folks, I am investigating if Pinot can be used as a realtime monitoring system. We have a federated Prometheus setup right now for tracking various metrics produced by our services. However, Prometheus has some serious drawbacks at scale, namely: • high cardinality kills performance • aggregation is slower the finer the resolution • overall performance issues when going over 500M timeseries I am exploring Pinot as a potential replacement for some parts of this system. The idea is to produce high cardinality metrics but have them ingested and queried from Pinot. It looks doable but I am looking for validation from the community. Also, can Pinot understand PromQL? I saw something about support being added but not sure what's the status.
    m
    r
    • 3
    • 5
  • p

    Prathamesh

    08/09/2025, 9:52 AM
    Hello Team We are trying to explore apache pinot to go away from postgres db to leverage its capabilites We are using hive data as raw and iceberg data at the final layer which is used in postgres using dbt-trino. Now the final data we want to ingest in pinot and use it to view on UI. Below are some queries - 1. Is pinot capable of handling iceberg data 2. As for now it is batch upload and the structure needs to be build for table/schema is it feasible to use "batchIngestionConfig": { "segmentIngestionType": "REFRESH", "segmentIngestionFrequency": "DAILY" } Happy to take suggestion as we are still in exploratory phase Thanks
    m
    • 2
    • 1
  • s

    San Kumar

    08/12/2025, 3:25 AM
    Hello Team we want to replace /create a segment with combination of dd-mm-yy-hh-<productid>-<country> in offline table .is it possible to do so and help me how can i define segment
  • z

    Zaeem Arshad

    08/12/2025, 3:47 AM
    Are there any docs/videos exploring the structure of Pinot and what makes it so performant and what are the scaling/performance boundaries?
    m
    • 2
    • 7
  • a

    Arnav

    08/12/2025, 4:23 AM
    Hello team I’m currently aiming to keep all segments generated during batch ingestion close to 256MB in size. To achieve this, I’ve implemented a logic that sets a maximum document count per segment, which I adjust dynamically based on the characteristics of the data, so that the segment size stays approximately within the target. I’m wondering if there’s a more efficient or standardized approach to achieve this?
    m
    • 2
    • 1
  • a

    arnavshi

    08/12/2025, 7:05 AM
    Hi team, I’ve set up an EKS cluster for Pinot stack in our ArrowEverywhereCDK package. The cluster is already running, and I’m now trying to configure Deep Store for a Pinot table using this guide. While deploying the changes, I’m encountering the following error:
    Copy code
    Forbidden: updates to statefulset spec for fields other than \'replicas\', \'ordinals\', \'template\', \'updateStrategy\', \'persistentVolumeClaimRetentionPolicy\' and \'minReadySeconds\' are forbidden\n'
    While I understand that this is a Kubernetes issue/limitation, I wanted your guidance on what can be done to resolve this.
  • s

    San Kumar

    08/12/2025, 11:09 AM
    hello team how can i create a segment with 1hou_product_id..can we create this segment and append to segment when i got product id for same hour
  • a

    am_developer

    08/12/2025, 11:31 AM
    Creating realtime one big table in pinot for all analytics use case. How big is too big for pinot in terms of number of columns in one table? In this case there are 250 columns.
    j
    m
    • 3
    • 2
1...156157158159160Latest