https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • d

    Daniel Wunderlich

    04/28/2025, 2:19 PM
    👋 Hi, I have a couple of questions about upgrading Pinot to the latest version. We've been running Pinot 1.0 in production for over a year, on a small EKS cluster with a single node (t3.xlarge) and EBS storage. Segments are also written to DeepStorage in S3. We kept most of the default Helm chart values, apart from a few tweaks like replica count, volume sizes, Xmx etc. In terms of problems, we only had a single incident with the Zookeeper instance hitting the node limit (we generate a lot of small segments), which we solved by increasing the
    jute.maxbuffer
    . So it's about time we reviewed and upgraded our setup, including setting up a merge rollup task to solve the small segments situation. Main plan: • Upgrade Pinot to the latest version ◦ Is it best to start a fresh cluster/helm installation and restore segments from S3, or can we just upgrade Helm? Are segments backwards-compatible? • Use 2+ replicas for each component, and at least 2 K8S nodes for HA. ◦ Any recommendations on instance types? I hear that M/T are generally good. ◦ I read somewhere that it's best to run Zookeeper on its own, and not use the "embedded" Helm chart. Is this correct? Any other suggestions/pitfalls? Really appreciate the help.
    x
    • 2
    • 4
  • a

    Aman Satya

    04/29/2025, 9:38 AM
    Hello everyone. I’m encountering an error while running an SQL query after enabling the 'Multi-Stage Engine' in the SQL editor. It’s unable to locate the table, which is of hybrid type. Error Code: 720 QueryPlanningError: Error composing query plan for: SELECT * FROM auw_subscriptions LIMIT 1 org.apache.pinot.query.QueryEnvironment.planQuery(QueryEnvironment.java:190) org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:177) org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:168) org.apache.pinot.broker.requesthandler.BrokerRequestHandlerDelegate.handleRequest(BrokerRequestHandlerDelegate.java:116) Table does not exist: 'auw_subscriptions' org.apache.pinot.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143) org.apache.pinot.query.catalog.PinotCatalog.getTable(PinotCatalog.java:71) org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:128) org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:283) the table do exists, and I am able to query it if I disable the "Multi-Stage Engine". I got this error first when I tried to do self join on this table, later on I found out that "Multi-Stage engine" is not working properly. Got any tips for troubleshooting this? I'd really appreciate the help!
    x
    • 2
    • 12
  • v

    Vipin Rohilla

    04/29/2025, 9:39 AM
    What are the best practices for adding fields/indexes in schema ? We added a field and had to reload 11k segment caused all of pinot servers to go in dead state because of high heap exhaustion .
    x
    • 2
    • 9
  • v

    vinod kumar naidu

    04/29/2025, 11:36 AM
    is there a way to make a communication between ssl enable zookeeper and pinot process ? if yes please let us know
  • m

    Mannoj

    04/29/2025, 11:49 AM
    Team , Im testing data to be pulled from deepstorage in the event of pinot server goes down. Here is the test env: Pinot Cluster : 1.2 Table : airlineStats Total Rows : 276 Deep Storage : HDFS HDFS data : Available (hdfs://pinotperf/pinot/airlineStats/airlineStats_16082_16082_0) Pinot Segement: Resides in Pinot server008 replication : "1" Now if I shutdown server008. Since data lies only in 008 , cluster should try to fetch data from deep storage automatically is my understanding. But that doesn't happen. I see the status of the segment says updating, but no data when we try to query. Is this is the way it should work?
    v
    x
    • 3
    • 15
  • s

    SP

    04/29/2025, 10:00 PM
    Hi Team, I’m facing an issue with the Trino Pinot connector in a Kubernetes environment, and I could use some assistance. Setup: • Pinot Cluster: Deployed via Helm with ZooKeeper, initially with 3 broker pods, but I’ve scaled it down to 1 broker pod. • Trino Configuration: Using the Pinot connector with both
    pinot.controller-urls
    and
    pinot.broker-url
    set in the catalog properties. Problem: Although I’ve pointed the
    pinot.broker-url
    to the active Pinot service (which should resolve to the single active broker), Trino seems to ignore this setting and continues to try connecting to the brokers via the headless service, including the brokers that have been scaled down. Has anyone encountered a similar issue where Trino continues to use a headless service instead of the defined broker-url? Any advice on ensuring Trino only connects to the active broker after scaling down? Thanks in advance for your help! Configuration:
    Copy code
    pinot.controller-urls=pinot-controller.pinot.svc.cluster.local:9000
    pinot.broker-url=pinot-broker.pinot.svc.cluster.local:8000
    Error:
    Copy code
    trino> select * from pinot.default.airlinestats;
    Query 20250429_210757_00092_aich9 failed: Failed communicating with server: <http://pinot-broker-2.pinot-broker-headless.pinot.svc.cluster.local:8000/debug/routingTable/airlineStats>
    
    trino> select * from pinot.default.airlinestats;
    Query 20250429_210805_00094_aich9 failed: Failed communicating with server: <http://pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local:8000/debug/routingTable/airlineStats>
    x
    • 2
    • 32
  • r

    Rajat

    04/30/2025, 6:50 AM
    @Xiang Fu @Mayank what is the helm repo for pinot the existing one is not working.
    x
    • 2
    • 14
  • r

    Rajat

    04/30/2025, 11:09 AM
    one more blocker that I am having a retention of 3 days on my Realtime table out of Hybrid Table. Now that suppose today is 30th and after 3 days segments that came today will be deleted, but what I want is I dont want any data that has a created_at of 30th april on 4th may
    x
    • 2
    • 90
  • r

    Rajat

    04/30/2025, 11:09 AM
    any suggestion @Xiang Fu @Mayank?
  • t

    Tsvetan

    04/30/2025, 12:39 PM
    Hi team. I have Pinot deployed on AWS EKS with official helm chart. I am facing authentication problems between Pinot component when the Server tries to complete segments and cannot call the Controller.
    2025/04/30 06:17:46.386 ERROR [ServerSegmentCompletionProtocolHandler] [player_sessions_active_minutes__9__0__20250430T0525Z] Could not send request *<http://pinot-controller-1.pinot-controller-headless.pinot.svc.cluster.local:9000/segment>* │
    │ Consumed?reason=rowLimit&streamPartitionMsgOffset=4446773&instance=Server_10.65.77.10_8098&name=player_sessions_active_minutes__9__0__20250430T0525Z&rowCount=100000&memoryUsedBytes=5248627                                                  │
    │ org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 401 (Unauthorized) with reason: "HTTP 401 Unauthorized" while sending request: /segmentConsumed?reason=rowLimit&streamPartitionMsgOffset=4446773&instance= │
    │ Server_10.64.78.10_8098&name=player_sessions_active_minutes__9__0__20250430T0525Z&rowCount=100000&memoryUsedBytes=5248627 to controller: pinot-controller-1.pinot-controller-headless.pinot.svc.cluster.local, version: Unknown               │
    │     at org.apache.pinot.common.utils.http.HttpClient.wrapAndThrowHttpException(HttpClient.java:476) ~[pinot-all-1.4.0-SNAPSHOT-jar-with-dependencies.jar:1.4.0-SNAPSHOT-eb9c759344502969c80e3e9ec00fe67bd24d2965]
    I have enabled pinotAuth in my helm chart values override
    pinotAuth:
    enabled: true
    controllerFactoryClass: org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory
    brokerFactoryClass: org.apache.pinot.broker.broker.BasicAuthAccessControlFactory
    configs:
    - access.control.principals=admin,user,viewer
    - access.control.principals.admin.password=${admin_pass}
    - access.control.principals.user.password=${user_pass}
    - access.control.principals.viewer.password=${viewer_pass}
    - access.control.principals.user.permissions=READ,WRITE
    - access.control.principals.viewer.permissions=READ
    However I cannot understand where in the helm chart I can configure basic auth access control to the controller my reference point is the documentation here -> https://docs.pinot.apache.org/operators/tutorials/authentication/basic-auth-access-control I tried passing extra configs to the helm chart like so
    server:
    extra:
    configs: |-
    pinot.server.segment.fetcher.auth.token=Basic ${admin_pass}
    pinot.server.segment.uploader.auth.token=Basic ${admin_pass}
    pinot.server.instance.auth.token=Basic ${admin_pass}
    or in jvmOpts but neither worked. 🆘
    x
    • 2
    • 7
  • z

    Zhuangda Z

    04/30/2025, 4:14 PM
    Hi team, is there a document that helps explain what the following metrics mean?
    Copy code
    numDocsScanned:981376,
    numEntriesScannedInFilter:55072901,
    numEntriesScannedPostFilter:1962752,
    numSegmentsQueried:303,
    numSegmentsProcessed:45,
    numSegmentsMatched:28,
    For example,
    numEntriesScannedInFilter
    , does
    entries
    mean docs? And
    InFilter
    means after applying relevant indices, there are still
    55072901
    need to be scanned for filtering(non-indexed cols)? And what makes
    numEntriesScannedPostFilter
    !=
    numDocsScanned
    ?
    x
    • 2
    • 3
  • c

    Chao Cao

    05/01/2025, 12:46 AM
    Hi, I'd like some guidance on performing large joins. 1. I ran into some issues where i had increase max rows in join from 1M to 10M. 2. Now I'm running into the max rows limit again and raising to 100M seems to be too big (my requests are timing out) Here is my query:
    Copy code
    SET maxRowsInJoin = 10,000,000;
    SELECT 
      price.amount,
      price.offerId,
      price.sellerId,
      price.itemId
    FROM price
    LEFT JOIN offer 
      ON price.sellerId = offer.sellerId
      AND price.offerId = offer.offerId
    WHERE 
        offer.internal_item_id = <valid_id>
        OR price.itemId = <valid_id>
    x
    • 2
    • 5
  • i

    Ilam Kanniah

    05/01/2025, 4:34 PM
    Hi team 👋 , I would like your thoughts on the approach here. We have existing tables whose dateTimeSpecField is defined as a
    LONG
    datatype and we would now like to add a timestamp index to that field but realized that the index can only be applied on
    TIMESTAMP
    datatype. Changing the data type is not a backward compatible change and fails schema update even though the underlying physical value stored is in long format for both. Is the only approach to add a new column and migrate to that column / table with all the data. I was wondering if the index can support datetimespec
    LONG
    value or the schema update can be done from
    LONG
    to
    TIMESTAMP
    instead in-place. Let me know what you think. thanks
    m
    • 2
    • 2
  • c

    Chao Cao

    05/01/2025, 5:22 PM
    Hey there Pinot Community, We're in the process of setting up our Pinot cluster and are looking for some guidance and best practices on monitoring and health checks. Specifically: 1. Recommended Tools and Integrations: What are the best tools and integrations for monitoring Pinot clusters? Are there any specific tools you would recommend for metrics, logging, and alerting? 2. Health Check Endpoints: What health check endpoints should we be monitoring? Are there any critical endpoints that we should pay special attention to? (I've already read the deployment and monitoring page) 3. Best Practices: Are there any established best practices for setting up monitoring for Pinot clusters? Any tips on configuring alerts and thresholds to ensure optimal cluster performance? 4. Common Pitfalls: What are some common pitfalls to avoid when setting up monitoring for a Pinot cluster? Any insights, resources, or examples from your experiences would be greatly appreciated. Thanks in advance for your help!
    x
    • 2
    • 1
  • p

    Puneet Singh

    05/02/2025, 12:33 PM
    Hi Team, I am facing below error while trying to bootstrap data into Pinot table. What could be the possible cause? Configs in the thread:
    Copy code
    2025/05/02 12:30:45.940 ERROR [HelixHelper] [jersey-server-managed-async-executor-13] Caught exception while updating ideal state for resource: captain_offers_kpi_REALTIME
    java.lang.IllegalStateException: Failed to find partition id for segment: captain_offers_kpi_REALTIME_1717632025246_1745973288376_27_e91caabf-bfae-4eb7-a68e-10726ec6e634 of table: captain_offers_kpi_REALTIME
    	at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:838) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.controller.helix.core.assignment.segment.StrictRealtimeSegmentAssignment.getPartitionId(StrictRealtimeSegmentAssignment.java:145) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.controller.helix.core.assignment.segment.StrictRealtimeSegmentAssignment.assignSegment(StrictRealtimeSegmentAssignment.java:81) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.lambda$assignTableSegment$16(PinotHelixResourceManager.java:2306) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.common.utils.helix.HelixHelper$1.call(HelixHelper.java:126) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.common.utils.helix.HelixHelper$1.call(HelixHelper.java:112) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:58) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:112) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:240) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.assignTableSegment(PinotHelixResourceManager.java:2298) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    --
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2.run(ServerRuntime.java:825) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
    	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    	at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
    x
    • 2
    • 10
  • s

    Starysn

    05/05/2025, 9:34 AM
    Hi folks, when transforming the timestamp data, there are some data that doesn't make sense for example the date became `+200000-07-09 000000:00`is there a way to make the data null only for the broken date time? I want it to be
    2025-05-02
    only. Below is my schema and table configuration Schema:
    Copy code
    "enableColumnBasedNullHandling": true,
    "dateTimeFieldSpecs": [
    {
          "name": "datepromised",
          "dataType": "TIMESTAMP",
          "format": "TIMESTAMP",
          "granularity": "1:DAYS"
    }
    ]
    Table:
    Copy code
    "ingestionConfig": {
        "transformConfigs": [
          {
            "columnName": "datepromised",
            // "transformFunction": "FromDateTime(SUBSTR(JSONPATHSTRING(_airbyte_data, '$.datepromised'), 0, 10), 'yyyy-MM-dd')"
            "transformFunction": "FromDateTime(SUBSTR(REPLACE(JSONPATHSTRING(_airbyte_data, '$.datepromised'), '+', ''), 0, 10), 'yyyy-MM-dd')"
          }
        ]
    }
    x
    • 2
    • 4
  • j

    Jose Luis

    05/05/2025, 2:38 PM
    hi y’all, happy monday i’m disabling following Configure TLS doc and moving from mixed mode to 1 way TLS throughout ive confirmed controller / broker are both listening on https ports and responding to queries however it seems Broker <-> Server connection is broken if i query with Multi Stage, i’m getting back 200 (i’m assuming non TLS here - not sure) for single stage, i’m getting the error below on broker: [org.pinot.core.transport.QueryRouter] Caught exception while sending request to server IP_R, marking query failed Caused by … connection refused IP:8098 2 servers [IP1_R, IP2_R] not responded digging into the QueryRouter code, how are the ServerRoutingInstance tables generated or server port decided? servers have nettyTlsPort instance config but it’s still deciding to use Http Netty port which is disabled
    x
    • 2
    • 6
  • r

    ramesh.samineedi

    05/06/2025, 7:45 AM
    Can I check if Local Deployment of Apache ThirdEye supports Data Science based detection algorithms
    x
    • 2
    • 1
  • p

    Prijo Pauly

    05/06/2025, 8:49 AM
    Hi Team, pinot tables are stuck at updating status. While checking Ideal state and External View for all these tables(through pinot UI), server-4 is making the issue. But I could not see any error messages in server-4 logs. Some tables, replication is 1 and those tables are failed in query as segment in server-4 are missing. Could you please help me to identify the root cause for this issue ?
    x
    p
    • 3
    • 8
  • p

    Preethi Evelyn Sadanandan

    05/07/2025, 8:21 AM
    Hi all, my team at work is currently working on migrating a few Databricks jobs to be unity catalog compatible. We need some help with 2 things: 1. How do import data from Blob storage into a Pinot instance in Azure? Is there any documentation or help you can provide here? 2. How do you import this class on a UC job cluster
    org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
    ?
  • v

    Vipin Rohilla

    05/07/2025, 9:01 AM
    Hi All, is pinot 1.3 compatible with HDFS "hdfs-3.2.3.3.2.2.0-1", getting error while starting controller?
    Copy code
    2025-05-07 14:21:58.978 INFO  [main] PinotFSFactory - Did not find any fs classes in the configuration
    2025-05-07 14:21:58.979 INFO  [main] PinotFSFactory - Got scheme hdfs, initializing class org.apache.pinot.plugin.filesystem.HadoopPinotFS
    2025-05-07 14:21:58.979 INFO  [main] PinotFSFactory - Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
    2025-05-07 14:21:59.098 INFO  [zk-disconnector-1-thread-1] ZooKeeper - Session: 0x30048f6f3aa0010 closed
    2025-05-07 14:21:59.100 INFO  [main-EventThread] ClientCnxn - EventThread shut down for session: 0x30048f6f3aa0010
    2025-05-07 14:21:59.357 WARN  [main] NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    2025-05-07 14:21:59.826 ERROR [main] StartServiceManagerCommand - Failed to start a Pinot [CONTROLLER] at 7.837 since launch
    java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto tried to access method 'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' (org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed module of loader 'app')
            at org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto.<init>(ErasureCodingProtos.java:10445) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
            at org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto.<clinit>(ErasureCodingProtos.java:10948) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
            at java.base/java.lang.Class.forName0(Native Method) ~[?:?]
    x
    • 2
    • 3
  • r

    Rajat

    05/07/2025, 9:25 AM
    Hi team, I have a parquet file in which one record is like this
    Copy code
    [{"A_applied_weight_amount_double":94.4,"A_awb_code":"**********","A_charge_weight_amount_double":null,"A_id":605966337,"A_shipment_id":820771786,"Ar_awb_id":605966337,"Ar_zone":"z_d","Is_deleted":false,"Merged_topic_ts_ms":"2025-05-07T03:57:59.695Z","O_created_at":"2025-05-02T09:53:31Z","O_customer_city":"K.V.Rangareddy","O_customer_pincode":"500079","O_customer_state":"Telangana","O_id":824402352,"O_net_total_double":1930,"O_payment_method":"prepaid","O_shipping_method":"SR","O_sla":48,"O_total_double":1930,"Op":"s","S_awb":"**********","S_awb_assign_date":"2025-05-02T09:53:32Z","S_company_id":4613330,"S_courier":"Delhivery Surface 2 Kgs","S_created_at":"2025-05-02T09:53:31Z","S_etd":"2025-05-06T10:56:41Z","S_id":820771786,"S_order_id":824402352,"S_rto_delivered_date":"1969-12-31T18:30:00Z","S_rto_initiated_date":"1969-12-31T18:30:00Z","S_sr_courier_id":44,"S_status":7,"S_updated_at":"2025-05-05T11:59:44Z","Ts_ms_kafka":"2025-05-07T03:57:59.695Z"}]
    In this the s_created_at is
    Copy code
    "S_created_at":"2025-05-02T09:53:31Z"
    but when ingesting them into pinot via LaunchDataIngestionSpec the same record was showing timestamp with 5:30 added why?
    Copy code
    {
      "columns": [
        "s_id",
        "s_created_at"
      ],
      "records": [
        [
          820771786,
          "2025-05-02 15:23:31.0"
        ]
      ]
    }
    here's a snap from pinot for same id: anyone can help? @Xiang Fu @Mayank
    x
    • 2
    • 3
  • n

    Nithinjith Pushpakaran

    05/07/2025, 9:40 AM
    Apache Pinot + Azure Event Hub Integration: Metadata Fetch Timeout Hey team, I’m integrating Apache Pinot (Dockerized) with Azure Event Hub (Kafka-compatible endpoint) for real-time ingestion. The connection works perfectly from a standalone Python Kafka client using confluent_kafka, but Pinot fails with: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata ✅ Confirmed Working: Pinot Server can curl and TLS handshake with icpeventhubns.servicebus.windows.net:9093 Python consumer successfully consumes data from ICPtrafficdataeventhub topic Using correct sasl.jaas.config format with $ConnectionString and EntityPath streamConfigs use stream.kafka.consumer.prop.* correctly ❌ Still Seeing: Pinot throws a metadata fetch timeout when creating the real-time table, even though connectivity and credentials are confirmed. Stream Config: For your reference ---------------------------------------- "streamConfigs": { "streamType": "kafka", "stream.kafka.topic.name": "icptrafficdataeventhub", "stream.kafka.broker.list": "icpeventhubns.servicebus.windows.net:9093", "stream.kafka.consumer.type": "simple", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.consumer.prop.bootstrap.servers": "XXXns.servicebus.windows.net:9093", "stream.kafka.consumer.prop.security.protocol": "SASL_SSL", "stream.kafka.consumer.prop.sasl.mechanism": "PLAIN", "stream.kafka.consumer.prop.ssl.endpoint.identification.algorithm": "", "stream.kafka.consumer.prop.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\\\"$ConnectionString\\\" password=\\\"Endpoint=sb://XXXns.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=<My acess key>=;EntityPath=icptrafficdataeventhub\\\";", "realtime.segment.flush.threshold.rows": "50000", "realtime.segment.flush.threshold.time": "24h" }
    x
    • 2
    • 2
  • g

    Georgi Varbanov

    05/07/2025, 1:55 PM
    Hi team, can you help me understand why there is a difference between Use Multi-Stage Engine and not using it, i have a 1 table with numOfDocs (350m) and totalDocs(1b - due to unnesting collections). When i query without multi-engine i get smaller number of segments processed than with multi-engine. There is only 1 realtime-table that has 1224(100 of which are consuming) segments, no retention period. What could be the reason for such differences?
    m
    • 2
    • 4
  • g

    Georgi Varbanov

    05/07/2025, 3:10 PM
    How should i approach ID/string columns that are going to be only returned from queries and not used in any filtering/aggregations, for example i have TeamId and TeamName i have invertedIndex on TeamId, but TeamName will be used only as a result to some queries and i don't really need index on it, in which tableIndexConfig -> columns array i should put it? noDictionaryColumns or varLengthDictionaryColumns?
    x
    • 2
    • 5
  • p

    Prasad V

    05/08/2025, 4:08 AM
    Hi All , Table status shows as BAD ,during high real time ingestion. At server side no errors reported. is there any other way to check the root cause of this issue
  • p

    Prasad V

    05/08/2025, 4:35 AM
    I see there is an error in broker 2025/05/08 042630.919 WARN 2025/05/08 042630.919 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting old segment: table1_1746041400000_1746045000000_437_5b4527a0-5d78-4a9b-8c0b-d7b663ec0fee for table: table1 (all candidate instances: [] are disabled, counting segment as unavailable)
    x
    • 2
    • 5
  • t

    telugu bharadwaj

    05/08/2025, 9:08 AM
    Hi everyone!, I'm trying to configure Pinot with Deep Store using HDFS within a Docker setup. I haven't been able to find specific documentation for this. Has anyone successfully done this before or have any pointers on the necessary configurations? Any help would be greatly appreciated!
  • m

    Monika reddy

    05/08/2025, 1:47 PM
    @Kishore G For a non-upsert table, if I need to change the topic key, do we need to restart the pinot-server? I know for upsert tables we should, but just confirming. If a restart is needed, please let me know why.
    x
    • 2
    • 3
  • g

    Georgi Varbanov

    05/08/2025, 2:02 PM
    Hello what is the average consuming latency that you see in your use cases as no matter what i do i see around 3-4 seconds latency, which is not ideal. Can you tell me what data would you need in order to help me determine if we can bring it down to under 1 second latency
    x
    • 2
    • 9