https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • r

    Raghavendra M

    07/31/2025, 4:46 AM
    Hi Team, anyone have done pinot migration from one cluster A to other cluster B. Do we have doc's for this migration. Basically we want to migrate hdfs data from one cluster to other cluster and make segments available on target cluster for querying.
  • a

    Aman Satya

    07/31/2025, 9:46 AM
    Hi team, Is it possible to introduce or configure tenants after the Pinot cluster has already been deployed? Also, since I’m deploying Pinot using Kubernetes with Helm, can I directly upgrade the cluster using Helm to add these tenants?
    m
    • 2
    • 1
  • s

    Shubham Kumar

    07/31/2025, 5:35 PM
    Hi Team, Could you please advise how I can convert the following files into a human-readable format, such as
    .txt
    ? •
    columns.psf
    •
    creation.meta
    •
    validdocids.bitmap.snapshot
    •
    ttl.watermark.partition.0
    Additionally, I would appreciate it if you could explain the purpose of each of these files
    m
    • 2
    • 8
  • s

    Shivam Sharma

    08/01/2025, 11:12 AM
    Hi team, Where can I find the release notes of Apache Pinot to get the details of the features added in new versions? And does the latest docker image references to latest stable version of Pinot? Is 1.3.0 is the latest stable version? CC: @Mayank @Xiang Fu
    m
    x
    • 3
    • 12
  • s

    San Kumar

    08/02/2025, 3:43 AM
    Hello Can Join works well in pinot.We have small table of few rows.
    m
    • 2
    • 1
  • x

    Xiang Fu

    08/05/2025, 11:19 AM
    🚨 Reminder: [Apache Pinot Contributor Call #3] is happening today! 📅 Date: August 5, 2025 ⏰ Time: 8:30 AM – 9:30 AM PDT (11:30 AM – 12:30 PM EDT / 3:30 PM – 4:30 PM UTC/ 9:00 PM - 9:30 PM IST) 👉 New Zoom Link (updated!): https://startreedata.zoom.us/j/89751791664?pwd=FpqfyztyKmf8TUPa4C8WhbsNYXGYHV.1&jst=2 🧭 Agenda: • 8:30 AM: Graceful Node Replacement — @X G • 9:00 AM: Timeseries Engine GA: New Features & Roadmap — @Shaurya Chaturvedi ⏱️ Please join promptly at 8:30 AM PDT. We’ll record the session and share it afterward in Slack. See you there! Slack Conversation
  • s

    San Kumar

    08/05/2025, 4:22 PM
    Hello team What is the tuning parameter required to upload the segment to offline pinot table. As I see pinot do tar.gz before uploading the file.After gz our file is around 8gb .Can anyone suggest how to handle upload quickly
    m
    • 2
    • 1
  • x

    Xiang Fu

    08/05/2025, 5:31 PM
    🎬 Apache Pinot Contributor Call #3 Recording is Here! 🟢 Stream the full session on YouTube:

    https://youtu.be/YniO1cXJEas▾

    🗓️ Recorded on August 5, 2025 → Featuring: • Graceful Node Replacement by Xin Gao • Timeseries Engine GA: Features & Roadmap by Shaurya Chaturvedi Why watch? 🚀 – Learn automation best practices for real-time Pinot clusters – Preview new features and roadmap direction – Hear from active contributors shaping the future of Pinot Learn more about Apache Pinot—a real-time, high‑throughput OLAP datastore powering companies like LinkedIn and Uber. Join our Slack community (5K+ members!) to ask questions, share feedback, or get involved. ✅ Watch now, join the conversation, and stay tuned for future calls! Slack Conversation
  • m

    Mohemmad Zaid

    08/06/2025, 6:30 AM
    Why can't we use multi-value column in funciton_column pair of Startree Index? I understand limitation of using multi-value column in split order but we should be able to use it aggregation. Eg ->
    spaces
    is multi value column.
    Copy code
    {
      "dimensionsSplitOrder": [
        "pdate"
      ],
      "functionColumnPairs": [
        "DISTINCTCOUNTHLLMV__spaces"
      ]
    }
    https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java#L1309 IMO, we can avoid this check for aggregation column.
    k
    • 2
    • 2
  • r

    Raghavendra M

    08/06/2025, 7:52 AM
    Hello Team, Do we have any class for gzip record reader Record Reader Spec? https://docs.pinot.apache.org/configuration-reference/job-specification from above doc i don't see gzip record reader. Am trying to read gzip and push to pinot using batch job. @Mayank any idea on this?
    x
    • 2
    • 1
  • s

    Shubham Kumar

    08/07/2025, 9:33 AM
    Hello team, In batch ingestion, i want to add data paritioning. so, according to pinot doc in table index config segmentPartitionConfig adding this property will be enough or during ingesting data to pinot in segment creation i have to make segments according to segmentPartitionConfig and then partitioning will work? https://docs.pinot.apache.org/configuration-reference/table#table-index-config
    m
    • 2
    • 1
  • z

    Zaeem Arshad

    08/08/2025, 1:01 PM
    Hi folks, I am investigating if Pinot can be used as a realtime monitoring system. We have a federated Prometheus setup right now for tracking various metrics produced by our services. However, Prometheus has some serious drawbacks at scale, namely: • high cardinality kills performance • aggregation is slower the finer the resolution • overall performance issues when going over 500M timeseries I am exploring Pinot as a potential replacement for some parts of this system. The idea is to produce high cardinality metrics but have them ingested and queried from Pinot. It looks doable but I am looking for validation from the community. Also, can Pinot understand PromQL? I saw something about support being added but not sure what's the status.
    m
    r
    • 3
    • 5
  • p

    Prathamesh

    08/09/2025, 9:52 AM
    Hello Team We are trying to explore apache pinot to go away from postgres db to leverage its capabilites We are using hive data as raw and iceberg data at the final layer which is used in postgres using dbt-trino. Now the final data we want to ingest in pinot and use it to view on UI. Below are some queries - 1. Is pinot capable of handling iceberg data 2. As for now it is batch upload and the structure needs to be build for table/schema is it feasible to use "batchIngestionConfig": { "segmentIngestionType": "REFRESH", "segmentIngestionFrequency": "DAILY" } Happy to take suggestion as we are still in exploratory phase Thanks
    m
    • 2
    • 1
  • s

    San Kumar

    08/12/2025, 3:25 AM
    Hello Team we want to replace /create a segment with combination of dd-mm-yy-hh-<productid>-<country> in offline table .is it possible to do so and help me how can i define segment
  • z

    Zaeem Arshad

    08/12/2025, 3:47 AM
    Are there any docs/videos exploring the structure of Pinot and what makes it so performant and what are the scaling/performance boundaries?
    m
    • 2
    • 7
  • a

    Arnav

    08/12/2025, 4:23 AM
    Hello team I’m currently aiming to keep all segments generated during batch ingestion close to 256MB in size. To achieve this, I’ve implemented a logic that sets a maximum document count per segment, which I adjust dynamically based on the characteristics of the data, so that the segment size stays approximately within the target. I’m wondering if there’s a more efficient or standardized approach to achieve this?
    m
    • 2
    • 1
  • a

    arnavshi

    08/12/2025, 7:05 AM
    Hi team, I’ve set up an EKS cluster for Pinot stack in our ArrowEverywhereCDK package. The cluster is already running, and I’m now trying to configure Deep Store for a Pinot table using this guide. While deploying the changes, I’m encountering the following error:
    Copy code
    Forbidden: updates to statefulset spec for fields other than \'replicas\', \'ordinals\', \'template\', \'updateStrategy\', \'persistentVolumeClaimRetentionPolicy\' and \'minReadySeconds\' are forbidden\n'
    While I understand that this is a Kubernetes issue/limitation, I wanted your guidance on what can be done to resolve this.
  • s

    San Kumar

    08/12/2025, 11:09 AM
    hello team how can i create a segment with 1hou_product_id..can we create this segment and append to segment when i got product id for same hour
  • a

    am_developer

    08/12/2025, 11:31 AM
    Creating realtime one big table in pinot for all analytics use case. How big is too big for pinot in terms of number of columns in one table? In this case there are 250 columns.
    j
    m
    • 3
    • 2
  • a

    Abdulaziz Alqahtani

    08/14/2025, 11:02 AM
    Hey, I’m trying to measure ingestion lag and came across two metrics: •
    availabilityLagMsMap
    from
    /consumingSegmentsInfo
    → reports ~200–400 ms for me. •
    endToEndRealtimeIngestionDelayMs
    from Prometheus → shows a “saw-tooth” pattern, peaking around 5 seconds. Can someone explain the difference between these two metrics, why they report different values, and whether the saw-tooth pattern is expected?
    j
    • 2
    • 2
  • i

    Idlan Amran

    08/18/2025, 2:38 AM
    hi team. by right at the moment we managed to work on a POC to “roll/dedup” our data on realtime table by querying historical data using python for a fixed time range like last 1 week, grouped it, flushed to json and push segments to our historical offline table using ingestion job spec. managed to reduce the segment size from 130GB++ on realtime table to 13GB++ for segment size on the offline table. by right i guess this is an unconventional ways of doing things since its kinda hard for us to use upsert table because its pretty memory consuming and been taking down our server for few times last time, are there anyone that do this kind of workaround or something similar to support your needs / use case ? our server spec : EC2 m7a.xlarge 4VCPU 16GB RAM running all components: ZK, kafka , 1 controller, 1 broker, 1 server, 1 minion we are targeting not that huge volume of query, most likely 10 - 15 QPS but not that frequent since this data is a historical data and rarely used. only used during debug and some handful of use cases for our application. plus we are resorting to this because there are too many duplicates. and the difference between this duplicates is 2 column, timestamp and log ID column ( we refers this log ID to our main Postgres DB). so i grouped it to this query and flushed the response to json for each of this
    profile
    , each JSON will have around 5M rows so it will have consistent JSON and segment size:
    Copy code
    SELECT shop, svid, spid, type, profile, "key", message, product,
                       CAST(MAX(created_at) AS TIMESTAMP) AS created_at,
                       ARRAY_AGG(product_log, 'STRING', TRUE) AS product_log
                FROM   product_tracking
                WHERE  profile = {profile}
                  AND  created_at >= CAST(DATE_TRUNC('DAY', timestampAdd(DAY,{-lookback_days},NOW()), 'MILLISECONDS','GMT-04:00') AS TIMESTAMP)
                  AND  created_at <  CAST(DATE_TRUNC('DAY', timestampAdd(DAY,0,NOW()), 'MILLISECONDS','GMT-04:00') AS TIMESTAMP)
                GROUP BY shop, svid, spid, type, profile, "key", message, product
                LIMIT 999999999
    need help for any insights/feedback from other Pinot OSS users, thanks.
  • r

    Rishabh Sharma

    08/18/2025, 12:37 PM
    Hi Team, We have an analytics use case with a special requirement where we provide dynamic columns to the user which need not to be defined beforehand while deciding the schema and we provide querying capabilities on those fields as well. We have been exploring pinot, it fits well except for these dynamic fields. To solve this we first explored json type columns but the performance was not up to the mark, now we are looking into dynamically changing schema and adding column whenever we see a new dynamic field (which should not happen frequently) while processing the record and then putting that record into pinot. I have a few questions : 1. The number of records in the table when a new field appears can be a 100s of millions, would that be an issue when the schema changes or when segments are reloaded after schema change? 2. We are planning to keep sending records which do not have new fields to pinot even while we see some record with new field and we are processing schema changes for that. In pinot docs we found instructions to pause data consumption while changing schema. We are halting the records with new fields but if there is no new field in some other record we are continuing putting those records into pinot kafka topic. Can it result in corrupt data?
    m
    j
    g
    • 4
    • 14
  • s

    San Kumar

    08/19/2025, 5:28 AM
    Hello Team In our offlinetable we have many many small segments which is per hour .I,e segment created perhour.Some time we get 20 to 50 records., is there any Minion task configuration to merge smaller segments to larger segment where segment is older than 30 days,Also how minion job will triger what configuration I need to follow.
  • s

    San Kumar

    08/19/2025, 5:54 AM
    is merge rollup support for OFFLINE tables, on APPEND only?is it support on REFRESH. can we schedule MergeRollupTask with a cron expression.Can you please help me on this
  • k

    kranthi kumar

    08/19/2025, 1:29 PM
    Hi team, I want to understand how the consumption flow works during a server restart after a crash or dead state . So, for my usecase each individual record is very critical and i want to have no duplicate in my pinot . As per my understanding, when a server crashes, the segment which is actively consuming is paused . And when the server gets restarted the paused segment should start reconsuming from last committed offset from ZK , this way duplicates might intrude. Is it the correct flow and if yes, are there any ways to avoid duplicates without losing any record ?.
    m
    • 2
    • 4
  • m

    Milind Chaudhary

    08/20/2025, 5:49 AM
    Hi Team, Can I override the field value to blank in ingestion transformation?
    m
    • 2
    • 7
  • i

    Indira Vashisth

    08/21/2025, 12:52 PM
    Hi team, we are planning to eliminate the intermediate step where the server sends the segment to the controller, and the controller pushes it to deepstore. Instead, the proposal is for the server to write directly to deepstore. Could someone help us understand the pros and cons of both approaches so that we can make a more informed decision?
    m
    • 2
    • 4
  • s

    Shubham Kumar

    08/21/2025, 1:00 PM
    Hi Team, I have a couple of queries regarding Apache Pinot: 1. Does Pinot support segment compression formats other than
    tar.gz
    , such as zstd or Snappy? 2. I created an index on a column (
    col1
    ) and ingested data. Suppose a segment contains 50 records, and I run a query with the condition
    col1 = 'xyz'
    . In this case, does Pinot load the entire segment into memory and then filter the records, or does it directly fetch only the matching data from the segment?
    m
    • 2
    • 38
  • s

    Sandeep R

    08/25/2025, 11:36 PM
    Pinot server: single big LV vs multiple mount points for segment storage?
    m
    s
    +2
    • 5
    • 16
  • j

    Jan Siekierski

    08/27/2025, 11:33 AM
    I understand that Iceberg support on Apache Pinot is only available in StarTree cloud right now, correct? Are there plans to add this to Apache Pinot in the near future?
    🌟 1