Apache Pinot #general

Abdulaziz Alqahtani

08/14/2025, 11:02 AM

Hey, I’m trying to measure ingestion lag and came across two metrics: • availabilityLagMsMap
from

/consumingSegmentsInfo

→ reports ~200–400 ms for me. • endToEndRealtimeIngestionDelayMs
from Prometheus → shows a “saw-tooth” pattern, peaking around 5 seconds. Can someone explain the difference between these two metrics, why they report different values, and whether the saw-tooth pattern is expected?

Idlan Amran

08/18/2025, 2:38 AM

hi team. by right at the moment we managed to work on a POC to “roll/dedup” our data on realtime table by querying historical data using python for a fixed time range like last 1 week, grouped it, flushed to json and push segments to our historical offline table using ingestion job spec. managed to reduce the segment size from 130GB++ on realtime table to 13GB++ for segment size on the offline table. by right i guess this is an unconventional ways of doing things since its kinda hard for us to use upsert table because its pretty memory consuming and been taking down our server for few times last time, are there anyone that do this kind of workaround or something similar to support your needs / use case ? our server spec : EC2 m7a.xlarge 4VCPU 16GB RAM running all components: ZK, kafka , 1 controller, 1 broker, 1 server, 1 minion we are targeting not that huge volume of query, most likely 10 - 15 QPS but not that frequent since this data is a historical data and rarely used. only used during debug and some handful of use cases for our application. plus we are resorting to this because there are too many duplicates. and the difference between this duplicates is 2 column, timestamp and log ID column ( we refers this log ID to our main Postgres DB). so i grouped it to this query and flushed the response to json for each of this

profile

, each JSON will have around 5M rows so it will have consistent JSON and segment size:

Copy code

SELECT shop, svid, spid, type, profile, "key", message, product,
                   CAST(MAX(created_at) AS TIMESTAMP) AS created_at,
                   ARRAY_AGG(product_log, 'STRING', TRUE) AS product_log
            FROM   product_tracking
            WHERE  profile = {profile}
              AND  created_at >= CAST(DATE_TRUNC('DAY', timestampAdd(DAY,{-lookback_days},NOW()), 'MILLISECONDS','GMT-04:00') AS TIMESTAMP)
              AND  created_at <  CAST(DATE_TRUNC('DAY', timestampAdd(DAY,0,NOW()), 'MILLISECONDS','GMT-04:00') AS TIMESTAMP)
            GROUP BY shop, svid, spid, type, profile, "key", message, product
            LIMIT 999999999

need help for any insights/feedback from other Pinot OSS users, thanks.

Rishabh Sharma

08/18/2025, 12:37 PM

Hi Team, We have an analytics use case with a special requirement where we provide dynamic columns to the user which need not to be defined beforehand while deciding the schema and we provide querying capabilities on those fields as well. We have been exploring pinot, it fits well except for these dynamic fields. To solve this we first explored json type columns but the performance was not up to the mark, now we are looking into dynamically changing schema and adding column whenever we see a new dynamic field (which should not happen frequently) while processing the record and then putting that record into pinot. I have a few questions : 1. The number of records in the table when a new field appears can be a 100s of millions, would that be an issue when the schema changes or when segments are reloaded after schema change? 2. We are planning to keep sending records which do not have new fields to pinot even while we see some record with new field and we are processing schema changes for that. In pinot docs we found instructions to pause data consumption while changing schema. We are halting the records with new fields but if there is no new field in some other record we are continuing putting those records into pinot kafka topic. Can it result in corrupt data?

San Kumar

08/19/2025, 5:28 AM

Hello Team In our offlinetable we have many many small segments which is per hour .I,e segment created perhour.Some time we get 20 to 50 records., is there any Minion task configuration to merge smaller segments to larger segment where segment is older than 30 days,Also how minion job will triger what configuration I need to follow.

San Kumar

08/19/2025, 5:54 AM

is merge rollup support for OFFLINE tables, on APPEND only?is it support on REFRESH. can we schedule MergeRollupTask with a cron expression.Can you please help me on this

kranthi kumar

08/19/2025, 1:29 PM

Hi team, I want to understand how the consumption flow works during a server restart after a crash or dead state . So, for my usecase each individual record is very critical and i want to have no duplicate in my pinot . As per my understanding, when a server crashes, the segment which is actively consuming is paused . And when the server gets restarted the paused segment should start reconsuming from last committed offset from ZK , this way duplicates might intrude. Is it the correct flow and if yes, are there any ways to avoid duplicates without losing any record ?.

Milind Chaudhary

08/20/2025, 5:49 AM

Hi Team, Can I override the field value to blank in ingestion transformation?

Indira Vashisth

08/21/2025, 12:52 PM

Hi team, we are planning to eliminate the intermediate step where the server sends the segment to the controller, and the controller pushes it to deepstore. Instead, the proposal is for the server to write directly to deepstore. Could someone help us understand the pros and cons of both approaches so that we can make a more informed decision?

Shubham Kumar

08/21/2025, 1:00 PM

Hi Team, I have a couple of queries regarding Apache Pinot: 1. Does Pinot support segment compression formats other than

tar.gz

, such as zstd or Snappy? 2. I created an index on a column (

col1

) and ingested data. Suppose a segment contains 50 records, and I run a query with the condition

col1 = 'xyz'

. In this case, does Pinot load the entire segment into memory and then filter the records, or does it directly fetch only the matching data from the segment?

Sandeep R

08/25/2025, 11:36 PM

Pinot server: single big LV vs multiple mount points for segment storage?

Jan Siekierski

08/27/2025, 11:33 AM

I understand that Iceberg support on Apache Pinot is only available in StarTree cloud right now, correct? Are there plans to add this to Apache Pinot in the near future?

🌟 1

John Solomon J

08/28/2025, 7:17 PM

Hi all, I have opened apache/pinot#16707 to add cursor pagination in pinot-java-client. I don’t have label permission; could you please review & apply required labels?

Vatsal Agrawal

08/29/2025, 5:43 AM

Hi Team, We are facing an issue with MergeRollupTask in our Pinot cluster. After the task runs, the original segments are not getting deleted, and we end up with both the original and the merged segments in the table. Retention properties: left as default. Any guidance on what we might be missing would be super helpful. Adding task, table and segments related details in the thread.

Arnav

08/29/2025, 5:52 PM

Hi team, is there way to parse below kafka event and ingest to RT pinot ?

Copy code

{
  "start_time_new": {
    "long": 1756489188000
  },
  "event_time_new": {
    "long": 1756489188000
  }
}

i tried below configuration but it's not parsing

Copy code

"ingestionConfig": {
    "transformConfigs": [
      {
        "columnName": "start_time_new",
        "transformFunction": "jsonPathLong(__raw__start_time_new, '$.long', 0)"
      },
      {
        "columnName": "event_time_new",
        "transformFunction": "jsonPathLong(__raw__event_time_new, '$.long', 0)"
      }
    ],
    "continueOnError": false,
    "rowTimeValueCheck": false,
    "segmentTimeValueCheck": true
  }

Rajkumar

08/30/2025, 6:23 PM

Hi All, Very impressed with what Apache Pinot can do, and I am considering Pinot for a critical use case, and we are not Java experts in our team - Would Java be a key skill to adopt Pinot successfully? An additional question, will join between two realtime tables work? Information online seem to suggest, that join between two realtime tables are not recommended for Production, just checking if anyone here has experiences around this - thanks.

Arnav

09/01/2025, 7:07 AM

Hi team, I enabled "stream.kafka.metadata.populate": "true", to get below fields. These fields i have added in schema also. __key __metadata$offset __metadata$partition __metadata$recordTimestamp But on querying table __metadata$offset __metadata$partition __metadata$recordTimestamp these are populated properly but __key is coming as blank. Since my kafka event and key are avro encoded. I used following config:

Copy code

"stream.kafka.decoder.prop.format": "AVRO",
"stream.kafka.decoder.prop.schema.registry.schema.name": "schema-name",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
"stream.kafka.decoder.prop.schema.registry.rest.url": "schema-url",
"stream.kafka.decoder.prop.key.format": "AVRO",
"stream.kafka.decoder.prop.key.schema.registry.schema.name": "schema-name-key",
"stream.kafka.decoder.prop.key.schema.registry.rest.url": "schema-url",

data is also properly deserialised. Only __key is blank. My guess is that below configs i added is not able to deserialise it. Is there any other way to deserialise the key?

Copy code

"stream.kafka.decoder.prop.key.format": "AVRO",
"stream.kafka.decoder.prop.key.schema.registry.schema.name": "schema-name-key",
"stream.kafka.decoder.prop.key.schema.registry.rest.url": "schema-url",

Abdulaziz Alqahtani

09/01/2025, 7:17 PM

Hi team, we have a multi-tenant hybrid table where each row has a

tenant_id

(ULID). The column is low cardinality, and most queries include a

tenant_id

predicate. What’s the best way to index this column?

cesho

09/04/2025, 2:16 PM

Can someone explain how Apache Pinot integrates with Confluent Schema Registry during Kafka stream ingestion? Specifically: 1. Does Pinot use Schema Registry only for deserialization of Avro/Protobuf messages, or can it automatically generate Pinot table schemas from the registered schemas? 2. If auto-generation is supported, what are the limitations or required configurations? 3. How does Pinot handle schema evolution in Schema Registry (e.g., backward/forward compatibility) during ingestion? 4. Are there any best practices for defining Pinot schemas when using Schema Registry to avoid data type mismatches? Context: I’m setting up real-time ingestion from Kafka topics with Avro schemas stored in Schema Registry and want to minimize manual schema mapping work.

Abdulaziz Alqahtani

09/07/2025, 8:34 PM

Hi team, What’s the recommended approach for one-off batch ingestion of data from S3 into Pinot, Minion-based ingestion vs standalone ingestion? For context: • I currently have a real-time table. • I want to import historical data into a separate offline table. • My source data is in PostgreSQL, and I can export and chunk it into S3 first.

09/08/2025, 8:09 PM

Hi Team, I'm running into an issue with the Pinot Controller UI and the Swagger REST API when using an NGINX Ingress with a subpath. I'm hoping someone has encountered this and can help. Here goes the problem summary: I've configured my ingress to expose the Pinot Controller at

<https://example.com/pinot/>

. The main UI works fine and most links are correctly routed. Those that works open on

<https://example.com/pinot/#/>...

However, the Swagger REST API UI link is not. Swagger API button, it tries to access

<https://example.com/help>

instead of

<https://example.com/pinot/help>

, resulting in a 404 Not Found error. I don't see an obvious way to enforce the swagger link subpath to something other than (/) ? I am using helm, and I have been looking for different options in https://github.com/apache/pinot/blob/master/helm/pinot/README.md but nothing worked.. thanks in advance..

Soon

09/11/2025, 5:19 PM

Hello team! I had a quick question if query plan shows

FILTER_SORTED_INDEX

would it be the same as using

FILTER_INVERTED_INDEX

like sorted inverted index?

Indira Vashisth

09/15/2025, 9:57 AM

Hi team, i triggered server rebalance in my pinot cluster with 3 servers, but the segment reassignment shows the target server for all the segments as only one server. How can i make it assign the data to all 3 servers.

Indira Vashisth

09/15/2025, 10:02 AM

Also what is the recommended size of data we should be storing per server? We will need to store more than 150TB of data and hit this data with complex queries including distinct, json match and sorting.

Trust Okoroego

09/17/2025, 4:49 PM

Possible Bug in Pinot LAG window function. (1.2.0)

Copy code

select
	  ORDER_ID,
	  ORDER_NUMBER,
	  CUSTORDER_ID,
	  ORDER_VALIDATION_CODE,
	  POD_CODE,
	  DELIVERY_FROM_DAT,
	  DELIVERY_TO_DAT,
	  CTL_CRE_TS,
	  CTL_MOD_TS,
	  ORDER_STATUS_CD,
	  SAREA_ID,
	    LAG(ON_HOLD_ORDER_AND_LOCKED_FLAG, 1, 0) OVER (PARTITION BY ORDER_ID ORDER BY CTL_MOD_TS) AS prev_is_active
		from
		Orders
  )

If default is not set, the result return correctly with the last row returning a NULL for prev_is_active since no row before it. However setting the default of 0 throws an unrelated timestamp error. Could this be related to NULL handling?

Copy code

at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
	at org.apache.pinot.query.service.server.QueryServer.submit(QueryServer.java:156)
	at org.apache.pinot.common.proto.PinotQueryWorkerGrpc$MethodHandlers.invoke(PinotQueryWorkerGrpc.java:284)
...
Caused by: java.lang.RuntimeException: Caught exception while submitting request: 1473823763000000159, stage: 2
	at org.apache.pinot.query.service.server.QueryServer.lambda$submit$1(QueryServer.java:144)
	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
	... 3 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed to instantiate WindowFunction for function: LAG
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
	at org.apache.pinot.query.service.server.QueryServer.lambda$submit$1(QueryServer.java:141)
	... 4 more
...
Caused by: java.lang.RuntimeException: Failed to instantiate WindowFunction for function: LAG
	at org.apache.pinot.query.runtime.operator.window.WindowFunctionFactory.construnctWindowFunction(WindowFunctionFactory.java:56)
	at org.apache.pinot.query.runtime.operator.WindowAggregateOperator.<init>(WindowAggregateOperator.java:145)
	at org.apache.pinot.query.runtime.plan.PhysicalPlanVisitor.visitWindow(PhysicalPlanVisitor.java:107)
	at org.apache.pinot.query.runtime.plan.PhysicalPlanVisitor.visitWindow(PhysicalPlanVisitor.java:65)
...
Caused by: java.lang.reflect.InvocationTargetException
	at jdk.internal.reflect.GeneratedConstructorAccessor151.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
...
Caused by: java.lang.UnsupportedOperationException: Cannot convert value from INTEGER to TIMESTAMP
	at org.apache.pinot.common.utils.PinotDataType$5.toTimestamp(PinotDataType.java:300)
	at org.apache.pinot.common.utils.PinotDataType$10.convert(PinotDataType.java:593)
	at org.apache.pinot.common.utils.PinotDataType$10.convert(PinotDataType.java:545)
	at org.apache.pinot.query.runtime.operator.window.value.LagValueWindowFunction.<init>(LagValueWindowFunction.java:63)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submit(QueryDispatcher.java:198)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:95)
org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:219)
org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:133)

09/18/2025, 10:16 AM

Hi Team, I am trying to perform a filesystem ingestion to gcs bucket using minion SegmentGenerationAndPushTask. The example in the docs (https://docs.pinot.apache.org/basics/concepts/components/cluster/minion#segmentgenerationandpushtask) describes the config when fetching files form s3:

Copy code

"ingestionConfig": {
    "batchIngestionConfig": {
      "batchConfigMaps": [
        {
          "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
          "input.fs.prop.region": "us-west-2",
          "input.fs.prop.secretKey": "....",
          "input.fs.prop.accessKey": "....",
          "inputDirURI": "<s3://my.s3.bucket/batch/airlineStats/rawdata/>",
          ...

we have updated className to:

org.apache.pinot.plugin.filesystem.GcsPinotFS

, but we cannot fiure how to set the

gcpKey

instead of

secretKey

and

accessKey

properties. Probably we need to set gcp

projectId

as well.

09/23/2025, 9:38 AM

Hi Team, Can I add a sidecar container to pinot controller and broker pods when deploying using the helm chart?

Nicolas

09/24/2025, 2:46 PM

Hi everyone, Would like to know if it's possible to configure a real-time table, consuming from 2 different Kafka clusters ?

09/29/2025, 8:39 AM

Hi all, The Pinot Controller UI showes all tables configurations including SSL configs. is it possible to hide or mask sensitive info from the UI such as kafka truststore and keystore passwords?

Copy code

...,
    "tableIndexConfig": {
      "streamConfigs": {
        "security.protocol": "SSL",
        "ssl.truststore.location": "/opt/pinot/kafka-cert-jks/truststore.jks",
        "ssl.truststore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
        "ssl.truststore.type": "JKS",
        "ssl.keystore.location": "/opt/pinot/kafka-cert-jks/keystore.jks",
        "ssl.keystore.password": "P6cz00RPASSWORDPLAINTEXT006OTF5",
        "ssl.keystore.type": "JKS",
        "ssl.key.password": "P6cz00RPASSWORDPLAINTEXT006OTF5"

Sankaranarayanan Viswanathan

09/29/2025, 5:57 PM

Hello Everyone, wondering if I can get some guidance on something I am working on. I am storing events in a pinot table and we have a modified retention manager to delete segments based on the min and max values of an expiry date column on this table that is populated at ingestion time. Each event row in the pinot table is also associated with some external objects stored in S3 and we use the pinot table as source of truth. When a pinot segment goes out of retention we would like to delete those related objects in S3. Are there patterns on how to accomplish this?

Brook E

09/30/2025, 3:29 PM

Does anyone have any good strategies for how they automatically toggle data from real-time to offline?