https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • r

    Raunak Binani

    05/21/2025, 1:20 PM
    Hi all, I am using pinot-offline ingestion to upload data from Hdfs Cluster A into Pinot backed by Hdfs cluster B using spark. The hadoop config is read from the cluster on which the spark job is running. Is there a way to pass Hdfs config of both the clusters together... Or any other approach that you suggest??
    m
    • 2
    • 5
  • v

    Vipin Rohilla

    05/22/2025, 6:36 AM
    Hi ALl, I have configured basic auth using: doc, The admin user is able to successfully login to controller UI, however non admin user not able to login, it says invalid username password but curl returns results. Is this expected ?
    Copy code
    curl -v -u user:secret <http://stg-xxxxxxxxxx:9000/tables>
    *   Trying <http://10.xx.xx.xxx:9000|10.xx.xx.xxx:9000>...
    * TCP_NODELAY set
    * Connected to stg-xxxxxxxx (10.57.46.223) port 9000 (#0)
    * Server auth using Basic with user 'user'
    > GET /tables HTTP/1.1
    > Host: stg-xxxxxxxx:9000
    > Authorization: Basic dXNlcjpzZWNyZXQ=
    > User-Agent: curl/7.68.0
    > Accept: */*
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Pinot-Controller-Host: stg-xxxxxxx
    < Pinot-Controller-Version: Unknown
    < Access-Control-Allow-Origin: *
    < Access-Control-Allow-Methods: GET, POST, PUT, OPTIONS, DELETE
    < Access-Control-Allow-Headers: *
    < Content-Type: application/json
    < Content-Length: 135
    <
    * Connection #0 to host xxxxxxx left intact
    {"tables":["cmsCaseActivity","cmsCaseComment","cmsCaseLifecycleLog","cmsCaseSnapshot","cmsInvestigationNote","payment_payment","test"]}
    n
    • 2
    • 3
  • s

    Starysn

    05/22/2025, 6:37 AM
    Hi all, can I made pinot as a mirror of postgre? but the thing is on postgre there's data updates is it really applicable to do it? my approach here is moving the data from postgre -> airbyte -> kafka -> pinot ,so there's possibility that the data is duplicated in pinot when there's an update in postgre.
    x
    • 2
    • 38
  • j

    Jovan Vuković

    05/22/2025, 6:09 PM
    How to populate my 2 tables In Apache Pinto? Do i need to make batch imports or is there an easier way?
    m
    • 2
    • 4
  • j

    Jovan Vuković

    05/22/2025, 6:29 PM
    When I try to run bathc ingestion I get this
    docker exec pinot-controller ./bin/pinot-admin.sh \
    >   LaunchDataIngestionJob \
    >   -jobSpecFile /config/orders/orders_job_spec.json
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-all-1.1.0-jar-with-dependencies.jar) to method java.lang.Object.finalize()
    WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    2025/05/22 18:28:38.417 ERROR [LaunchDataIngestionJobCommand] [main] Got exception to generate IngestionJobSpec for data ingestion job -
    org.yaml.snakeyaml.constructor.ConstructorException: Cannot create property=recordReaderSpec for JavaBean=org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec@3f9270ed
    in 'string', line 1, column 1:
    {
    ^
    Cannot create property=inputFormat for JavaBean=org.apache.pinot.spi.ingestion.batch.spec.RecordReaderSpec@40e60ece
    in 'string', line 16, column 25:
    "recordReaderSpec": {
    ^
    Unable to find property 'inputFormat' on class: org.apache.pinot.spi.ingestion.batch.spec.RecordReaderSpec
    in 'string', line 17, column 22:
    "inputFormat": "JSON",
    ^
    in 'string', line 16, column 25:
    "recordReaderSpec": {
    ^
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:283) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:169) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:320) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObjectNoCheck(BaseConstructor.java:264) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:247) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:201) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:185) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:493) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:473) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.getSegmentGenerationJobSpec(IngestionJobLauncher.java:100) ~[pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:112) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:171) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:202) [pinot-all-1.1.0-jar-with-dependencies.jar:1.1.0-c2606742bbc4b15cff857eb0ffe7ec878ff181bb]
    Caused by: org.yaml.snakeyaml.constructor.ConstructorException: Cannot create property=inputFormat for JavaBean=org.apache.pinot.spi.ingestion.batch.spec.RecordReaderSpec@40e60ece
    in 'string', line 16, column 25:
    "recordReaderSpec": {
    ^
    Unable to find property 'inputFormat' on class: org.apache.pinot.spi.ingestion.batch.spec.RecordReaderSpec
    in 'string', line 17, column 22:
    "inputFormat": "JSON",
    ^
    m
    x
    • 3
    • 4
  • u

    전이섭

    05/23/2025, 1:19 PM
    Hi, I'm testing writing to a Pinot offline table using the Spark connector. When I run my code like this:
    Copy code
    df.write()
    .format("pinot")
    .option("controller", "localhost:9000")
    .option("table", "transcript")
    .mode(SaveMode.Append)
    .save("/tmp/pinot-segments")
    The segment files are created correctly in the
    /tmp/pinot-segments
    directory, but they are not uploaded to the actual Pinot cluster. Does the Spark Pinot connector not support writing directly to Pinot? It seems like it only creates the files locally. Thanks.
    m
    x
    b
    • 4
    • 4
  • j

    Jovan Vuković

    05/23/2025, 4:26 PM
    Can Anyone help me modify my docker file such that Whenever i rest docker everything that was stored on Pinot (tables and their data is persisted?) inside of my project i have pinot folder and inside of it is config folder.In it there are 2 additional folders orders table and order_items_enriched table. How can i preserve them on docker restart avoiding always executing these commands?
    docker exec pinot-controller ./bin/pinot-admin.sh \
    AddTable \ -tableConfigFile /config/orders/table.json \ -schemaFile /config/orders/schema.json \ -exec docker exec pinot-controller ./bin/pinot-admin.sh \ AddTable \ -tableConfigFile /config/order_items_enriched/table.json \ -schemaFile /config/order_items_enriched/schema.json \ -exec Here is the docker file version: "3.8" services: mysql: image: mysql/mysql-server:8.0.27 hostname: mysql container_name: mysql ports: - "3306:3306" environment: - MYSQL_ROOT_PASSWORD=debezium - MYSQL_USER=mysqluser - MYSQL_PASSWORD=mysqlpw volumes: - ./mysql/mysql.cnf:/etc/mysql/conf.d - ./mysql/mysql_bootstrap.sql:/docker-entrypoint-initdb.d/mysql_bootstrap.sql - ./mysql/data:/var/lib/mysql-files/data zookeeper: image: confluentinc/cp-zookeeper:7.6.0 hostname: zookeeper container_name: zookeeper ports: - "2181:2181" environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_TICK_TIME: 2000 healthcheck: { test: echo srvr | nc localhost 2181 } kafka: image: confluentinc/cp-kafka:7.6.0 hostname: kafka container_name: kafka depends_on: [ zookeeper ] ports: - "29092:29092" - "9092:9092" - "9101:9101" environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXTPLAINTEXT,PLAINTEXT HOSTPLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TOOLS_LOG4J_LOGLEVEL: ERROR KAFKA_JMX_PORT: 9101 KAFKA_JMX_HOSTNAME: localhost healthcheck: { test: nc -z localhost 9092, interval: 1s } console: hostname: console container_name: console image: docker.redpanda.com/redpandadata/console:latest restart: on-failure entrypoint: /bin/sh command: -c "echo \"$$CONSOLE_CONFIG_FILE\" > /tmp/config.yml; /app/console" environment: CONFIG_FILEPATH: /tmp/config.yml CONSOLE_CONFIG_FILE: | server: listenPort: 9080 kafka: brokers: ["kafka:9092"] schemaRegistry: enabled: false urls: ["http://schema-registry:8081"] connect: enabled: false ports: - "9080:9080" depends_on: - kafka enrichment: build: enrichment-kafka-streams restart: unless-stopped container_name: enrichment-kafka-streams environment: - QUARKUS_KAFKA_STREAMS_BOOTSTRAP_SERVERS=kafka:9092 - ORDERS_TOPIC=orders - PRODUCTS_TOPIC=products - ENRICHED_ORDERS_TOPIC=enriched-order-items depends_on: - kafka pinot-controller: image: apachepinot/pinot:1.1.0 command: "StartController -zkAddress zookeeper:2181" container_name: "pinot-controller" restart: unless-stopped ports: - "9000:9000" depends_on: - zookeeper healthcheck: test: [ "CMD-SHELL", "curl -f http://localhost:9000/health || exit 1" ] interval: 30s timeout: 10s retries: 3 start_period: 10s volumes: - ./pinot/config:/config pinot-broker: image: apachepinot/pinot:1.1.0 command: "StartBroker -zkAddress zookeeper:2181" restart: unless-stopped container_name: "pinot-broker" ports: - "8099:8099" depends_on: pinot-controller: condition: service_healthy healthcheck: test: [ "CMD-SHELL", "curl -f http://localhost:8099/health || exit 1" ] interval: 30s timeout: 10s retries: 3 start_period: 10s pinot-server: image: apachepinot/pinot:1.1.0 container_name: "pinot-server" command: "StartServer -zkAddress zookeeper:2181" restart: unless-stopped depends_on: pinot-broker: condition: service_healthy volumes: - ./pinot/data:/var/pinot/data dashboard-enriched: build: streamlit restart: unless-stopped container_name: dashboard-enriched ports: - "8502:8501" depends_on: pinot-controller: condition: service_healthy volumes: - ./streamlit/app_enriched.py:/workdir/app.py environment: - PINOT_SERVER - PINOT_PORT orders-service: build: orders-service restart: unless-stopped container_name: orders-service depends_on: - mysql - kafka environment: - MYSQL_SERVER=mysql - KAFKA_BROKER_HOSTNAME=kafka - KAFKA_BROKER_PORT=9092
    m
    • 2
    • 1
  • a

    anmol

    05/25/2025, 5:20 AM
    Hi team, I'm trying to ingest a stringified array field from Kafka into Apache Pinot and convert it into a proper multi-value column (
    STRING[]
    ), but I'm facing segment consumption errors. Here’s what I’ve done so far: • Pinot version: 1.0+ • In Kafka, the message contains a field like: "txn_notification_event_list": "[\"ZORBLAT\",\"QUIXANO\",\"FLUMPTION\",\"WIBBLEX\",\"SNOOFLE-VORTEX\",\"PLINKO-FUND-SWAP\",\"ZOOGLE\",\"FX-TRANSMOG\"]" I tried the following
    transformConfig
    in the table config: Config 1 : { "columnName": "txn_notification_event_list_array", "transformFunction": "jsonExtractArray(txn_notification_event_list, '$', 'STRING')" } Config 2 : "transformConfigs": [ { "columnName": "txn_notification_event_list_array", "transformFunction": "jsonFormatArray(JSONPARSE(txn_notification_event_list))" } ] And defined this in the schema: { "name": "txn_notification_event_list_array", "dataType": "STRING", "singleValueField": false } • The segment ends up in ERROR state on both servers. No
    segmentSize
    , no
    consumerInfo
    , and no
    errorInfo
    shown in the Pinot UI. I've verified that the Kafka messages are correctly formatted as stringified arrays, but I’m not sure if Pinot is parsing this properly or if something’s misconfigured in my schema/table setup. Would appreciate any help or pointers! Thanks!
    m
    • 2
    • 2
  • y

    Yeshwanth

    05/26/2025, 11:36 AM
    Hi Team, A small doubt about segments in pinot. In the deep store doc, It says that the deep store contains a compressed version of segments that typically won't include indexes but I have tried downloading the tar ball from the deepstore and on untarring it, it contains index data too. Is there any situation where in which index details are stripped from a segment before being uploaded to deep store ?
    m
    • 2
    • 1
  • m

    Monika reddy

    05/26/2025, 4:21 PM
    Hello, Any reason why Realtime tables cannot be a dimension table ?
    m
    • 2
    • 9
  • g

    Georgi Varbanov

    05/27/2025, 2:09 PM
    Hello i have a few questions in regards to server metrics, why is that some metrics are reported as pinot_server_pinot_server_realtimeRowsFetched_OneMinuteRate for example, but others are pinot_server_3_Value{database="upsertPrimaryKeysCount",... making them server specific
    x
    • 2
    • 10
  • m

    mathew

    05/28/2025, 10:58 AM
    Hi Team, Does pinot support upserting instead of appending. My Pinot will get updated on every 24 hrs. So there can be duplicate tickets in pinot right. Is there an effective way to counter this? Can i set a ticket_id as a primary key, and if that same ticket comes again to pinot, can i just update teh details instead of appending it? Pls help.. We are stuck here!!!🙂
    r
    m
    • 3
    • 12
  • f

    Fatlind Hoxha

    05/28/2025, 1:49 PM
    Hi everyone, I'm facing a critical issue with Apache Pinot realtime ingestion from Pulsar. When my Pinot table crashes or restarts, the consumer subscription gets deleted from Pulsar, causing data loss. The Problem: Pinot creates auto-generated subscription names (like reader-494b86923d) When Pinot crashes, this subscription gets deleted from Pulsar On restart, Pinot generates a NEW subscription name (reader-383a75812c) Since the old subscription is gone, Pinot starts from latest instead of resuming where it left off This causes data loss during the downtime period What I've tried: Setting stream.pulsar.consumer.prop.subscriptionName in table config (seems to be ignored) Configuring Pulsar retention and subscription expiration policies Using auto.offset.reset=smallest but it doesn't help when subscription is deleted Questions: How can I prevent Pinot's consumer subscriptions from being deleted when it crashes? Does Pinot actually respect the subscriptionName property or does it always auto-generate? What's the recommended pattern to handle consumer failures without data loss? Any help would be greatly appreciated! This is causing production data gaps that are hard to recover from. Thanks!
    m
    • 2
    • 1
  • l

    Luis P Fernandes

    05/30/2025, 10:13 AM
    👋 Hello, team! Was wondering if anyone can helps diagnose an issue we are experiencing with our cluster. We are consuming data from several kafka topics into realtime tables, ingestion stops but we cant seem to find any issue on the logs and also the checks on the tables dont show any issue: Consuming Segments:
    Copy code
    {
      "serversFailingToRespond": 0,
      "serversUnparsableRespond": 0,
      "_segmentToConsumingInfoMap": {
        "ericsson_ran_enodebfunction__0__16__20250530T0357Z": [
          {
            "serverName": "Server_100.100.25.106_8098",
            "consumerState": "CONSUMING",
            "lastConsumedTimestamp": 1748580550862,
            "partitionToOffsetMap": {
              "0": "465530"
            },
            "partitionOffsetInfo": {
              "currentOffsetsMap": {
                "0": "465530"
              },
              "latestUpstreamOffsetMap": {
                "0": "528919"
              },
              "recordsLagMap": {
                "0": "63389"
              },
              "availabilityLagMsMap": {
                "0": "8"
              }
            }
          }
        ]
      }
    }
    Table Debug: [ { "tableName": "ericsson_ran_enodebfunction_REALTIME", "numSegments": 5, "numServers": 4, "numBrokers": 2, "segmentDebugInfos": [], "serverDebugInfos": [], "brokerDebugInfos": [ { "brokerName": "Broker_100.100.123.30_8099", "idealState": "ONLINE", "externalView": "ONLINE" }, { "brokerName": "Broker_100.100.57.158_8099", "idealState": "ONLINE", "externalView": "ONLINE" } ], "ingestionStatus": { "ingestionState": "HEALTHY", "errorMessage": "" }, "tableSize": { "reportedSize": "9 MB", "estimatedSize": "9 MB" } } ]
    m
    • 2
    • 2
  • r

    Rajat

    05/30/2025, 10:39 AM
    Hi guys, these are the two messages in my kafka topic: @Xiang Fu @Mayank @Jackie
    Copy code
    Consumed message: key = 844881521, value = {"bch_name": "SHOPIFY", "bch_code": "SH", "ar_awb_id": null, "ar_zone": null, "ch_id": 2993000, "ch_name": "Shopify", "ch_company_id": 18682, "ch_base_channel_code": "SH", "am_awb_code": null, "am_ofd1": null, "am_picked_up_date": null, "o_id": 848523000, "o_company_id": 18682, "o_channel_id": 2993000, "o_shipping_method": "SR", "o_sla": 48, "o_customer_city": "Fatehpur", "o_customer_state": "Uttar Pradesh", "o_customer_pincode": "212655", "o_payment_method": "cod", "o_net_total": "\u0002¾¼", "o_total": "\u0002Ñà", "o_created_at": "2025-05-29 10:22:22", "co_id": null, "co_mode": null, "a_id": null, "a_awb_code": null, "a_cod": null, "a_shipment_id": null, "a_applied_weight_amount": null, "a_charge_weight_amount": null, "s_id": 844881521, "s_order_id": 848523000, "s_company_id": 18682, "s_courier": null, "s_sr_courier_id": null, "s_awb": null, "s_awb_assign_date": null, "s_total": "\u0002Ñà", "s_status": 11, "s_shipped_date": null, "s_delivered_date": null, "s_rto_initiated_date": null, "s_rto_delivered_date": null, "s_created_at": "2025-05-29 10:22:33", "s_updated_at": "2025-05-29 10:22:35", "s_pickup_scheduled_date": null, "s_etd": null, "op": "u", "awbs_source": "Unknown", "couriers_source": "Unknown", "ts_ms_source_kafka": 1748494354367, "ts_ms_merged_kafka": 1748494355210}, partition = 3, offset = 15113790, timestamp = 2025-05-29T04:52:35.213Z
    [4:03 PM] Kavya Ramaiah
    
    Consumed message: key = 844881521, value = {"bch_name": null, "bch_code": null, "ar_awb_id": null, "ar_zone": null, "ch_id": null, "ch_name": null, "ch_company_id": null, "ch_base_channel_code": null, "am_awb_code": null, "am_ofd1": null, "am_picked_up_date": null, "o_id": null, "o_company_id": null, "o_channel_id": null, "o_shipping_method": null, "o_sla": null, "o_customer_city": null, "o_customer_state": null, "o_customer_pincode": null, "o_payment_method": null, "o_net_total": null, "o_total": null, "o_created_at": null, "co_id": null, "co_mode": null, "a_id": null, "a_awb_code": null, "a_cod": null, "a_shipment_id": null, "a_applied_weight_amount": null, "a_charge_weight_amount": null, "s_id": 844881521, "s_order_id": null, "s_company_id": null, "s_courier": null, "s_sr_courier_id": null, "s_awb": null, "s_awb_assign_date": null, "s_total": null, "s_status": null, "s_shipped_date": null, "s_delivered_date": null, "s_rto_initiated_date": null, "s_rto_delivered_date": null, "s_created_at": "2025-05-29 10:22:33", "s_updated_at": null, "s_pickup_scheduled_date": null, "s_etd": null, "op": "d", "awbs_source": null, "couriers_source": null, "ts_ms_source_kafka": 1748494359974, "ts_ms_merged_kafka": null}, partition = 3, offset = 15114045, timestamp = 2025-05-29T04:52:41.778Z
    Ideally in pinot it should Delete them as I am using this tableConfig:
    Copy code
    {
      "tableName": "shipmentMerged_final",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "s_created_at",
        "timeType": "DAYS",
        "replication": "2",
        "retentionTimeUnit": "DAYS",
        "retentionTimeValue": "3",
        "minimizeDataMovement": false
      },
      "tableIndexConfig": {
        "loadMode": "MMAP",
        "nullHandlingEnabled": true,
        "createInvertedIndexDuringSegmentGeneration": true,
        "invertedIndexColumns": [
          "o_customer_city",
          "o_customer_pincode",
          "o_customer_state",
          "s_company_id",
          "s_courier",
          "o_shipping_method",
          "o_payment_method",
          "s_status",
          "s_sr_courier_id"
        ],
        "noDictionaryColumns": [
          "s_etd",
          "s_shipped_date",
          "a_awb_code",
          "s_order_id",
          "s_id",
          "a_id",
          "o_id",
          "a_shipment_id",
          "s_awb_assign_date",
          "s_delivered_date",
          "s_awb",
          "s_rto_initiated_date",
          "s_pickup_scheduled_date",
          "a_applied_weight_amount_double",
          "o_total_double",
          "o_created_at",
          "s_updated_at",
          "o_net_total_double",
          "a_charge_weight_amount_double",
          "s_rto_delivered_date",
          "ar_awb_id",
          "am_picked_up_date",
          "am_ofd1",
          "s_total_double",
          "ts_ms_merged_kafka"
        ],
        "bloomFilterColumns": [
          "s_company_id"
        ],
        "sortedColumn": [
          "s_company_id"
        ],
        "varLengthDictionaryColumns": [
          "o_customer_state",
          "ar_zone",
          "s_courier",
          "o_customer_pincode",
          "o_payment_method",
          "o_shipping_method",
          "o_customer_city"
        ]
      },
      "ingestionConfig": {
        "streamIngestionConfig": {
          "streamConfigMaps": [
            {
              "streamType": "kafka",
              "stream.kafka.consumer.type": "lowlevel",
              "stream.kafka.decoder.prop.format": "AVRO",
              "stream.kafka.consumer.group.id": "shipmentMerged-consumer-group",
              "stream.kafka.decoder.prop.schema.registry.rest.url": "<http://internal-adfbe53cf874c419b80ef29810ee56b7-1168949678.ap-south-1.elb.amazonaws.com:8081>",
              "stream.kafka.topic.name": "pinot_d0_d2_realtime",
              "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
              "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
              "stream.kafka.broker.list": "<http://internal-a01a7420dce764739aecf132fdd316d8-1810051101.ap-south-1.elb.amazonaws.com:9094|internal-a01a7420dce764739aecf132fdd316d8-1810051101.ap-south-1.elb.amazonaws.com:9094>",
              "stream.kafka.schema.registry.url": "<http://internal-adfbe53cf874c419b80ef29810ee56b7-1168949678.ap-south-1.elb.amazonaws.com:8081>",
              "realtime.segment.flush.threshold.time": "24h",
              "realtime.segment.flush.threshold.segment.size": "150M",
              "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
            }
          ]
        },
        "transformConfigs": [
          {
            "columnName": "ingestion_ts",
            "transformFunction": "now()"
          },
          {
            "columnName": "is_deleted",
            "transformFunction": "compareFields(op, 'd')"
          },
          {
            "columnName": "s_total_double",
            "transformFunction": "bytesToDouble(o_net_total, 10, 2)"
          },
          {
            "columnName": "o_net_total_double",
            "transformFunction": "bytesToDouble(o_net_total, 10, 2)"
          },
          {
            "columnName": "o_total_double",
            "transformFunction": "bytesToDouble(o_total, 10, 2)"
          },
          {
            "columnName": "a_applied_weight_amount_double",
            "transformFunction": "bytesToDouble(a_applied_weight_amount, 10, 2)"
          },
          {
            "columnName": "a_charge_weight_amount_double",
            "transformFunction": "bytesToDouble(a_charge_weight_amount, 10, 2)"
          }
        ]
      },
      "routing": {
        "instanceSelectorType": "strictReplicaGroup"
      },
      "upsertConfig": {
        "mode": "FULL",
        "consistencyMode": "SYNC",
        "comparisonColumns": [
          "ts_ms_source_kafka"
        ],
        "deleteRecordColumn": "is_deleted",
        "dropOutOfOrderRecord": true
      },
      "tenants": {},
      "metadata": {
        "customConfigs": {}
      }
    }
    In this I am using Upsert's delete record config to delete the records once is_deleted is true and that is true is op=d Why this happened can anyone give any suggestion of loophole???
    x
    • 2
    • 5
  • r

    Rajat

    05/30/2025, 10:42 AM
    in pinot this is present there: with op=u
  • a

    AG

    06/02/2025, 5:36 AM
    Documentation here https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart says use
    Copy code
    helm repo add pinot <https://raw.githubusercontent.com/apache/pinot/master/helm>
    kubectl create ns pinot-quickstart
    helm install pinot pinot/pinot \
        -n pinot-quickstart \
        --set cluster.name=pinot \
        --set server.replicaCount=2
    but
    Copy code
    <https://raw.githubusercontent.com/apache/pinot/master/helm>
    returns 404, where is the right path?
    • 1
    • 1
  • v

    Vipin Rohilla

    06/02/2025, 1:24 PM
    Hi Team, Is there a way to increase the queue size as pinot server has build around 10d + lag with so many tasks piled up in queue.
    Copy code
    2025/06/02 18:28:55.712 INFO [HelixTaskExecutor] [ZkClient-EventThread-119-xxxxxxxxx:2181,xxxxxxxx:2181,xxxxxxx:2181] Submit task: d81d401a-b902-4ca4-bcf9-ade6675e0e44 to pool: java.util.concurrent.ThreadPoolExecutor@1339f332[Running, pool size = 40, active threads = 40, queued tasks = 1373, completed tasks = 10622]
    m
    x
    • 3
    • 11
  • g

    Georgi Varbanov

    06/03/2025, 7:27 AM
    Hello can you advice if the following error is still relevant? When i added metadataTTL to my upsert table i got error saying the comparisonsColumns should be long and found Timestamp, is this relevant or i can ignore it via swagger api?
    x
    • 2
    • 11
  • a

    AG

    06/03/2025, 8:45 AM
    Where can i find list of all the possible values and explanations for
    tableIndexConfig?
    x
    • 2
    • 2
  • g

    Georgi Varbanov

    06/03/2025, 10:17 AM
    We have configured deep store according to the docs and have found that even tho there are no specific errors that pop up the segments are not uploaded to deep store, the relevant configs will be in the thread
    m
    • 2
    • 12
  • d

    Dong Zhou

    06/03/2025, 11:19 PM
    Hi team, I am trying out ZkBasicAuthAccessControl in our dev environment. So far it has been working well, I am able to log in with the inital user, and also able to create more users. However, I notice if I create a USER with READ permission on a table, this user will NOT be able to log into the Controller UI, but this USER is able to query the table via /sql endpoint. I will post the reproduction steps in the thread.
    x
    • 2
    • 5
  • g

    Georgi Varbanov

    06/04/2025, 7:29 AM
    I have one question that doesn't seem like it should happen, but still. We have apache pinot in prod with version 1.3, 21 servers (64RAM, 8CPU, "-XX:ActiveProcessorCount=8 -Xms24G -Xmx24G") deployed in k8s using yamls, not helm. When started consumption of a kafka topic with 21 partitions and replication factor of 3 without upsert or minion tasks, i saw a steady RAM at 16GB for quite a while (only ingesting without queries, at a constant 300-400msg/s), up until 20-30m rows it was stable, but with ingesting more data(currently at 520m rows) and ram continued to grow. We have to ingest 8billion more rows (historical data) and we need to know that the server is not going to just die. We have integrated deepstore as well and see the segments there if it is relevant. Based on server metrics it seems that MMap buffer is just keeps growing, is it expected?
    m
    • 2
    • 15
  • p

    prasanna

    06/04/2025, 8:15 AM
    hi Team, can someone provide me some context on my query below related to how s3 is involved/behaves when used as a deepstore. my current setup. pinot configure to bypass controller and using se as deepstore so server push and pull data directly from deepstore. i believe their is a config that can be defined (age i guess) in realtime table so realtime segments are still kept on disk. but we are not using this. recent problem faced. recently we had outage and lost connectivity with s3 from pinot which resulted in ingestion being stopped. this seems our mistake as along with bypassing controller we need to configure realtime table to keep ingesting and keeping segments locally while s3 gets back up. my queries. also currently we have intermittent http timeouts with s3. due to this segments external state goes to ERROR and table goes to bad state. while we are working on the network side of the issue i need to understand below. 1> what is the involvement of s3 is it just a deepstore for backup. even with our config does pinot still keep segments on disk. 2> when query is executed does it always go to s3 or still queries data on disk provided pinot keeps it and only for segment not on disk query goes to s3. i am trying to understand how and when s3 is queried. is it used for upload, query or both. below is the error we face currently i need to understand the behavior to setup pinot better. if any document is their related same it will help Caught exception while fetching segment from: s3://xyz/abc/abc__31__73__20250604T0211Z to: /var/pinot/server/data/index/abc_REALTIME/tmp/tmp-abc__31__73__20250604T0211Z-7b64a679-f65a-4325-abc8-956b372fea55/abc__31__73__20250604T0211Z.tar.gz org.apache.pinot.shaded.software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connect to xyz:9020 [xyz/0.0.0.0]
    m
    • 2
    • 4
  • c

    coco

    06/05/2025, 6:34 AM
    Hi. Pinot team. Have you ever experienced a phenomenon in pinot 1.2 where one replica of the consuming segment remains as COINSUMING? The previous consuming segment has 2 ONLINE and 1 CONSUMING replica. The newly created consuming segment becomes under-replica as only 2 CONSUMING segments are created. ex) segment__9__2027__timestampZ { "server1" : "ONLINE", "server2" : "ONLINE", "server3" : "CONSUMING", } segment__9__2028__timestampZ { "server1" : "CONSUMING", "server2" : "CONSUMING" } I've been analyzing this phenomenon for a while, but I can't find the cause.
    x
    m
    +2
    • 5
    • 32
  • r

    Rajat

    06/05/2025, 6:45 AM
    Hi Team, There's a doubt I had, there was some change in the transformation function in the Table config of my Realtime Table, But the changes doesn't reflect on the table. this transform function is an UDF and it is working when I recreated the table, but I want the older table to get refresh with new settings and configs, how can I do it? @Xiang Fu @Jackie I cannot point queries to new table how can I get it refresh with older table
    x
    m
    • 3
    • 21
  • g

    Gaurav

    06/06/2025, 9:31 AM
    Hi Team, We have deployed pinot in our kubernetes infra, is there a way we can expose the pinot controller UI with a sub path , instead of / ? We are using kong to expose pinot controller UI .
    x
    • 2
    • 1
  • g

    Georgi Varbanov

    06/06/2025, 9:45 AM
    Hello i have a the following case, can you tell me what will happen? We are a RealTime table without upsert that has time and partition pruning, during consumption there is a problem with a single record from application POV (record is skipped and not published), now other records with newer timestamps are published, can we then backfill this lost record with new kafka timestamp but old out of order creation time (used for the time pruning), will the somehow break the pruning process or it will manage to fix itself?
    m
    • 2
    • 4
  • v

    Vipin Rohilla

    06/09/2025, 5:14 PM
    Hi team, Any plans to support multiple data dir on a single pinot server instance ? (currently server conf are limited to a single data dir )
    m
    • 2
    • 2
  • e

    Eddie Simeon

    06/09/2025, 7:06 PM
    Hi Team, 1. I have a Pinot Server receiving requests for segments it does not own. The server is reporting exception processing errors and slow query processing time 2. Is it normal in a 3 broker setup where only 2 of the 3 brokers are making requests query requests
    m
    • 2
    • 1
1...162163164165166Latest