Apache Pinot #troubleshooting

Arkoprovo Sircar

09/26/2025, 11:22 AM

Hi Team. We are facing an issue at production's realtime ingestion table. No ingestion is having. controller's up checks show down for

controller-1

It seems some issue with controller trying to commit segments to our S3 deepstore? And we had enabled table setting and server setting to commit directly from server. Can somebody please help why this has happened? We have tried restarting all controller and servers these are the logs for

Controller

Copy code

2025/09/26 05:53:56.694 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-1] Processing segmentCommitEndWithMetadata:Offset: -1,Segment name: GatewayTest__9__2651__20250922T1206Z,Instance Id: Server_pinot-server-3.pinot-server-headless.pinot-prod.svc.cluster.local_8098,Reason: rowLimit,NumRows: 64257,BuildTimeMillis: 17312,WaitTimeMillis: 0,ExtraTimeSec: -1,SegmentLocation: <s3://zpinot-prod/GatewayTest/GatewayTest__9__2651__20250922T1206Z.tmp.016a1c02-a9e6-465d-aa20-01fe6c76fe0d,MemoryUsedBytes>: 0,SegmentSizeBytes: 305005624,StreamPartitionMsgOffset: 340975150
2025/09/26 05:53:56.699 INFO [SegmentCompletionFSM_GatewayTest__9__2651__20250922T1206Z] [grizzly-http-server-1] Processing segmentCommitEnd(Server_pinot-server-3.pinot-server-headless.pinot-prod.svc.cluster.local_8098, 340975150)
2025/09/26 05:53:56.699 INFO [SegmentCompletionFSM_GatewayTest__9__2651__20250922T1206Z] [grizzly-http-server-1] Committing segment GatewayTest__9__2651__20250922T1206Z at offset 340975150 winner Server_pinot-server-3.pinot-server-headless.pinot-prod.svc.cluster.local_8098
2025/09/26 05:53:56.699 INFO [PinotLLCRealtimeSegmentManager] [grizzly-http-server-1] Committing segment file for segment: GatewayTest__9__2651__20250922T1206Z
2025/09/26 05:53:56.700 INFO [LLCSegmentCompletionHandlers] [grizzly-http-server-0] Processing segmentConsumed:Offset: -1,Segment name: FintechOnboarding__24__437__20250922T1517Z,Instance Id: Server_pinot-server-7.pinot-server-headless.pinot-prod.svc.cluster.local_8098,Reason: timeLimit,NumRows: 39,BuildTimeMillis: -1,WaitTimeMillis: -1,ExtraTimeSec: -1,SegmentLocation: null,MemoryUsedBytes: 0,SegmentSizeBytes: -1,StreamPartitionMsgOffset: 1546
2025/09/26 05:53:56.700 ERROR [SegmentCompletionFSM_GatewayTest__9__2651__20250922T1206Z] [grizzly-http-server-1] Caught exception while committing segment file for segment: GatewayTest__9__2651__20250922T1206Z
java.lang.IllegalStateException: Connection pool shut down
	at org.apache.pinot.shaded.org.apache.http.util.Asserts.check(Asserts.java:34) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
	at org.apache.pinot.shaded.software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72) ~[pinot-all-

Priyank Bagrecha

09/29/2025, 9:43 PM

DistinctCount[Raw]ThetaSketch

the only Apache DataSketch sketch algorithm that is supported out of the box in Pinot? Looking at code it seems

DistinctCountHLL

DistinctCountHLLPlus

are coming from

stream-lib

implementation.

Mannoj

09/30/2025, 10:53 AM

Team, incase if I need to know if a pinot segment has committed full data into deep store. "segment.realtime.status":DONE => I should just look for this value right? That means are all committed to deepstore. Can you please clarify?

Priyank Bagrecha

10/02/2025, 7:23 PM

is there a newer version of grafana dashboard that is shared at https://docs.pinot.apache.org/operators/tutorials/monitor-pinot-using-prometheus-and-grafana this one uses angular and looks like it has been deprecated

Priyank Bagrecha

10/02/2025, 10:27 PM

I am ingesting data into an offline pinot table and noticed that the time to ingest data is increasing with every hourly partition. I also see logs in the spark job's logs like

Copy code

2025-10-02 12:09:58.319	
25/10/02 19:09:58 [dag-scheduler-event-loop] INFO DAGScheduler: Got job 1 (foreach at SparkSegmentMetadataPushJobRunner.java:219) with 2 output partitions
	2025-10-02 12:09:58.319	
25/10/02 19:09:58 [main] INFO SparkContext: Starting job: foreach at SparkSegmentMetadataPushJobRunner.java:219
	2025-10-02 12:09:58.301	
25/10/02 19:09:58 [main] INFO ConsistentDataPushUtils: Consistent data push is: disabled
	2025-10-02 12:09:58.273	
25/10/02 19:09:58 [main] INFO GcsPinotFS: Listed 654 files from URI: <gs://segment-store/pinot/controller-data/lego_gc>, is recursive: true

which makes me believe that it is trying to push all segments and not just the most recent hour's segments to pinot. this is what the batch ingestion spec yaml looks like

Copy code

executionFrameworkSpec:
  name: "spark"
  segmentGenerationJobRunnerClassName: "org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner"
  segmentTarPushJobRunnerClassName: "org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentTarPushJobRunner"
  segmentUriPushJobRunnerClassName: "org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentUriPushJobRunner"
  segmentMetadataPushJobRunnerClassName: "org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentMetadataPushJobRunner"
jobType: "SegmentCreationAndMetadataPush"
inputDirURI: "<gs://input-store/lego_gc/date_key=2025-10-01/hour=00/>"
includeFileNamePattern: "glob:**/*.parquet"
outputDirURI: "<gs://segment-store/pinot/controller-data/lego_gc>"
overwriteOutput: true
pinotFSSpecs:
  - scheme: "gs"
    className: "org.apache.pinot.plugin.filesystem.GcsPinotFS"
pinotClusterSpecs:
  - controllerURI: "pinot.something.com"
recordReaderSpec:
  dataFormat: "PARQUET"
  className: "org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader"
tableSpec:
  tableName: "lego_gc"
  schemaURI: "pinot.something.com/tables/lego_gc/schema"
  tableConfigURI: "pinot.something.com/tables/lego_gc"
segmentNameGeneratorSpec:
  type: "simple"
  configs:
    segment.name.prefix: "lego_gc_2025-10-01-00"
    exclude.sequence.id: true
pushJobSpec:
  pushParallelism: 2
  pushAttempts: 2
  pushRetryIntervalMillis: 1000

do i need to configure outputDirURI to be partitioned as well? like

gs://segment-store/pinot/controller-data/lego_gc/date_key=2025-10-01/hour=00/

? Please note that

gs://segment-store/pinot/controller-data/

is the location of segment store as configured in servers.

Milind Chaudhary

10/06/2025, 5:41 AM

While doing pinot grouping some results are getting omitted, When I apply specific condition for the missing records then it's visible but in the grouped results it's omitted.

Richa Kumari

10/06/2025, 5:46 AM

Hi Team , facing issues in implementing pagination using cursor , not getting cursor in response using which further bunch of rows needs to be fetched even after enabling it in configMap as mentioned in the document . May I know what steps am I missing given that our pinot version is > 1.3.0

madhulika

10/06/2025, 1:13 PM

Hi @Kartik Khare / @Mayank I was going through instance_assignment doc and it seems for LLC ideal config is one to one mapping of stream partition to pinot table partition. Is my understanding correct. My tables are upsert (partial & full both). In case of non upsert balance segment strategy should be fine. What do we in situation where some tables of high ingestion rate?

Priyank Bagrecha

10/06/2025, 10:19 PM

If I want to continue to use Druid's legacy datasketches for HLL and Quantile in Pinot - would it be possible to ingest hll / quantile states and merge them at query time via udf? or would i have to contribute functions to be able to use them?

Mayank

10/06/2025, 10:20 PM

I assume you are referring to Apache Datasketches (Druid just uses it)? If so, contributing to code is preferred.

Arkoprovo Sircar

10/07/2025, 1:26 PM

Hi Team, What is the best practice to migrate from one Kafka to a different Kafka endpoint We are doing this on production because we are switching previous AWS MSK to Azure's kafka provider How to go about this without affecting Pinot cluster and stop data loss?

Андрей Морозов

10/08/2025, 10:59 AM

Hi, all! I have a docker compose instance of pinot:latest and try to batch ingestion from parquet file. I have a problem with deleting old segments , if I try to reload data with deleting segments via API , then from disk, deleting a table and creating this table again. After ingestion job I see an old and new segments and rowcount with count of segments is incrementally increases. What I'm doing wrong ?

Tommaso Peresson

10/08/2025, 3:26 PM

Hello there, how can I configure Merge Rollup tasks to use

metadata

push mode to save the segments in the deep store and keep only metadata on the Controller?

Shubham Kumar

10/09/2025, 6:21 AM

Hi team, My current primary key count is around 100 million. Whenever I restart the server, the primary key count increases to around 260 million and then drops back to 100 million. Could you please help me understand why this behavior occurs?

madhulika

10/09/2025, 5:18 PM

Copy code

SELECT tripId,
  CASE
    WHEN total_task = delivered_task THEN 'COMPLETE_DELIVERY'
    WHEN total_task = delivered_task + returned_task THEN 'DELIVERED_RETURNED'
    WHEN delivered_task = '0' THEN 'NO_DELIVERY'
    ELSE 'PARTIAL_DELIVERY'
  END AS delivery_Type
FROM (
    SELECT DISTINCT tripId,
      COUNT(DISTINCT taskId) AS total_task,
      SUM(deliveredOrder) AS delivered_task,
      SUM(returnedOrder) AS returned_task
    FROM (
        SELECT tripId,
          CASE
            WHEN deliveryStatus IN ('DELIVERED') THEN 1
            ELSE 0
          END AS deliveredOrder,
          CASE
            WHEN deliveryStatus IN ('RETURNED') THEN 1
            ELSE 0
          END AS returnedOrder,
          taskId
        FROM lmd_task_db_snapshot task
        WHERE tripId IN (
           
          ) AND scheduleStart >= '2025-10-08 13:00:00.0'
          AND scheduleStart < '2025-10-10 08:00:00.0'
      ) I
    GROUP BY tripId
  )

Victor Bivolaru

10/10/2025, 11:09 AM

Hi, I would like to ask for some clarification regarding minions and using the rest API to execute a task (segmentGeneration) When I have no minions started and I try to execute a task, the task shows up in as

"Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4": "IN_PROGRESS"

when calling

GET /tasks/SegmentGenerationAndPushTask/state

, but when checking with

GET tasks/subtask/Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4/state

I get

Copy code

{
  "Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4_0": null
}

The controller logs clearly states

Copy code

2025/10/10 11:04:55.086 ERROR [JobDispatcher] [HelixController-pipeline-task-smth-(2c58d6d3_TASK)] No available instance found for job: TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4

I was expecting that the status of the task to also reflect that by showing

"NOT_STARTED"

Victor Bivolaru

10/10/2025, 1:46 PM

One more question without any ties to the previous message: Provided our setup creates small segments inside of which our data is sorted by a column

C1

that in the table config appears as a

sortedColumn

Nightly we would like to run a merge task but I am not sure if this task would keep the data sorted over the newly created segment. I am afraid the only way is writing a custom task

francoisa

10/10/2025, 2:05 PM

Hi team 😉 Really quick question about Derived Column and performances. I’ve got a few Derived Column in my table and I’m planning to add many other more using JSONEXTRACTSCALAR. My main concern is about performances on a REALTIME with allready existing segments. Does the reload needed to have this new column avalaible calculate it for each row of all segments ? Or is it calculated on flight at query time (I just hope it’s not the case)

raghav

10/10/2025, 2:09 PM

Hey Team, We are using Pinot in prod for more than 6 moths now. Suddenly we started facing issues when ingestion drops suddenly and recovers after some time. I checked the logs thoroughly and were able to find two errors in server. Controller logs looks fine. We have 24 servers, 36 kafka partitions, 50 GB memory each, peak ingestion rate 1MM rps, segment size - 300MB. Can anyone please help us understand this and mitigate this issue? ERROR/WARN logs

Copy code

pinotServer.2025-10-10.9.log.gz:2025/10/10 13:22:34.067 ERROR [RealtimeSegmentDataManager_metric_numerical_agg_1H__16__182629__20251010T1321Z] [metric_numerical_agg_1H__16__182629__20251010T1321Z] Holding after response from Controller: {"buildTimeSec":-1,"isSplitCommitType":true,"streamPartitionMsgOffset":null,"status":"NOT_SENT"}
pinotServer.2025-10-10.9.log.gz:2025/10/10 13:22:52.653 ERROR [ServerSegmentCompletionProtocolHandler] [metric_numerical_agg_1H__28__180921__20251010T1322Z] Could not send request <http://pinot-controller-0.pinot-controller-headless.d3-pinot-cluster.svc.cluster.local:9000/segmentConsumed?reason=rowLimit&streamPartitionMsgOffset=172544503662&instance=Server_pinot-server-1.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098&name=metric_numerical_agg_1H__28__180921__20251010T1322Z&rowCount=696146&memoryUsedBytes=338498296>

2025/10/10 13:14:41.871 WARN [AppInfoParser] [HelixTaskExecutor-message_handle_thread_5] Error registering AppInfo mbean
javax.management.InstanceAlreadyExistsException: kafka.consumer:type=app-info,id=metric_numerical_agg_1H_REALTIME-D3NumericalSketchPartitioned-28
at java.management/com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:322)

Yash Lohade

10/10/2025, 4:46 PM

Hell guys I just implemented Basic Auth to my Pinot but while inserting data in tables using apache flinks stream with this code public static void insertIntoPinot(DataStream<Row> sinkStream, String sinkTableName, String controllerURL) throws Exception { HttpClient httpClient = HttpClient.getInstance(); ControllerRequestClient client = new ControllerRequestClient( ControllerRequestURLBuilder.baseUrl(controllerURL), httpClient); Schema schema = PinotConnectionUtils.getSchema(client, sinkTableName); TableConfig tableConfig = PinotConnectionUtils.getTableConfig(client, sinkTableName, "OFFLINE"); Logging.log(jobName, "PROCESS", "Job Inserting Data into Pinot " + controllerURL); sinkStream.addSink( new PinotSinkFunction<>( new FlinkRowGenericRowConverter(TYPE_INFO), tableConfig, schema)) .name("PinotSink_" + sinkTableName) .setParallelism(PARALLELISM); Logging.log(jobName, "PROCESS", "Job Inserted Data into Pinot " + controllerURL); } but this connectors/drivers don't have an option for basic auth and passing of headers how could I ingest my data now, I also tried basic HTTP client request to ingest data but I was running into issues with batch ingest config import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.util.Base64; import java.util.ArrayList; import java.util.List; import java.util.Map; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; import org.apache.pinot.spi.config.table.TableConfig; import org.apache.pinot.spi.config.table.ingestion.BatchIngestionConfig; import org.apache.pinot.spi.config.table.ingestion.IngestionConfig; import org.apache.pinot.spi.data.Schema; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.types.Row; public static void insertIntoPinot(DataStream<Row> sinkStream, String sinkTableName, String controllerURL, String username, String password) throws Exception { // Encode credentials for basic auth String authString = username + ":" + password; String encodedAuth = Base64.getEncoder().encodeToString(authString.getBytes()); HttpClient httpClient = HttpClient.newHttpClient(); ObjectMapper mapper = new ObjectMapper(); // 1) Fetch schema JSON from Pinot Controller REST API String schemaUrl = controllerURL + "/schemas/" + sinkTableName; HttpRequest schemaRequest = HttpRequest.newBuilder() .uri(URI.create(schemaUrl)) .header("Authorization", "Basic " + encodedAuth) .GET() .build(); HttpResponse<String> schemaResponse = httpClient.send(schemaRequest, HttpResponse.BodyHandlers.ofString()); if (schemaResponse.statusCode() >= 300) { throw new RuntimeException("Failed to fetch schema: " + schemaResponse.body()); } Schema schema = mapper.readValue(schemaResponse.body(), Schema.class); // 2) Fetch table config JSON for OFFLINE table String tableConfigUrl = String.format("%s/tables/%s?type=OFFLINE", controllerURL, sinkTableName); HttpRequest tableConfigRequest = HttpRequest.newBuilder() .uri(URI.create(tableConfigUrl)) .header("Authorization", "Basic " + encodedAuth) .GET() .build(); HttpResponse<String> tableConfigResponse = httpClient.send(tableConfigRequest, HttpResponse.BodyHandlers.ofString()); if (tableConfigResponse.statusCode() >= 300) { throw new RuntimeException("Failed to fetch table config: " + tableConfigResponse.body()); } JsonNode rootNode = mapper.readTree(tableConfigResponse.body()); JsonNode offlineNode = rootNode.get("OFFLINE"); if (offlineNode == null) { throw new RuntimeException("OFFLINE config section not found in table config response"); } TableConfig tableConfig = mapper.treeToValue(offlineNode, TableConfig.class); // 3) Fix missing ingestionConfig->batchIngestionConfig->batchConfigMaps to avoid Pinot errors during ingestion if (tableConfig.getIngestionConfig() == null) { tableConfig.setIngestionConfig(new IngestionConfig()); } IngestionConfig ingestionConfig = tableConfig.getIngestionConfig(); if (ingestionConfig.getBatchIngestionConfig() == null) { // Must provide batchConfigMaps as empty list (required) List<Map<String, String>> batchConfigMaps = new ArrayList<>(); BatchIngestionConfig batchIngestionConfig = new BatchIngestionConfig(batchConfigMaps, null, null); ingestionConfig.setBatchIngestionConfig(batchIngestionConfig); } else { BatchIngestionConfig batchIngestionConfig = ingestionConfig.getBatchIngestionConfig(); if (batchIngestionConfig.getBatchConfigMaps() == null) { batchIngestionConfig.setBatchConfigMaps(new ArrayList<>()); } } // 4) Add PinotSinkFunction to Flink DataStream using fetched schema and updated table config sinkStream.addSink( new PinotSinkFunction<>( new FlinkRowGenericRowConverter(TYPE_INFO), // Your converter according to Row types tableConfig, schema)) .name("PinotSink_" + sinkTableName) .setParallelism(PARALLELISM); } I would appreciate you guys helping us out.

Satya Mahesh

10/13/2025, 1:40 PM

Hello guys, I’ve optimized the queries — when I run them directly in the Pinot controller, they complete in about 100–150 ms, and through Java integration they usually take around 200 ms. However, sometimes the execution time spikes to over 10 seconds. Could you please help me understand what might be causing this — is it related to the query itself or the Pinot table configuration? I set timeout 10 sec. 2025-10-13 154731.738 log=[{"errorCode":200,"message":"QueryExecutionError:\nReceived error query execution result block: {250=ExecutionTimeoutError\nProcessingException(errorCode:250, message:ExecutionTimeoutError)\n\tat org.apache.pinot.common.exception.QueryException.<clinit>(QueryException.java:113)\n\tat org.apache.pinot.common.datablock.DataBlockUtils.extractErrorMsg(DataBlockUtils.java:55)\n\tat org.apache.pinot.common.datablock.DataBlockUtils.getErrorDataBlock(DataBlockUtils.java:47)\n\tat org.apache.pinot.query.runtime.blocks.TransferableBlockUtils.getErrorTransferableBlock(TransferableBlockUtils.java:54)}\norg.apache.pinot.query.service.dispatch.QueryDispatcher.runReducer(QueryDispatcher.java:306)\norg.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:96)\norg.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:219)\norg.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:133)\n"}] 2025-10-13 154731.739 log=101731 ERROR traceId=, parentId=, spanId=, sampled= [io.qu.mu.ru.MutinyInfrastructure] (executor-thread-33) Mutiny had to drop the following exception: io.fastpix.metrix.AppException: something went wrong in pinot 2025-10-13 154731.739 log= at io.quarkus.vertx.core.runtime.VertxCoreRecorder$15.runWith(VertxCoreRecorder.java:638) 2025-10-13 154731.739 log= at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1594) 2025-10-13 154731.739 log= at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11) 2025-10-13 154731.739 log= at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11) 2025-10-13 154731.739 log= at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2654) 2025-10-13 154731.739 log= at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21) 2025-10-13 154731.739 log= at java.base/java.lang.Thread.run(Thread.java:1583) 2025-10-13 154731.739 log= at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27) 2025-10-13 154731.739 log= at io.fastpix.metrix.utils.PinotClientConfig_ClientProxy.executeQueryAsync(Unknown Source) 2025-10-13 154731.739 log= at io.fastpix.metrix.services.impl.MetricServiceImpl.lambda$getMetricsOfBreakdown$1(MetricServiceImpl.java:420) 2025-10-13 154731.739 log= at org.jboss.threads.EnhancedQueueExecutor.runThreadBody(EnhancedQueueExecutor.java:1627) 2025-10-13 154731.739 log= at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2675) 2025-10-13 154731.739 log= 2025-10-13 154731.739 log= at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:35) 2025-10-13 154731.739 log= at io.fastpix.metrix.utils.PinotClientConfig.executeQueryAsync(PinotClientConfig.java:61) 2025-10-13 154731.739 log= at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

raghav

10/13/2025, 3:19 PM

Hey Team, We are facing an issue with ingestion in pinot. Our prod cluster has stopped ingesting data. In server Helix logs I can see servers can't connect to zookeeper. I have tried restarting all the components. Disk usage on zookeeper = <5% CPU on zookeeper ~10% We have 24 servers, 36 kafka partitions, 50 GB memory each, peak ingestion rate 1MM rps, segment size - 300MB. Can anyone please help us understand this and mitigate this issue?

Copy code

2025/10/13 07:46:07.467 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( Disconnected )
2025/10/13 07:46:07.472 WARN [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 10000184ff502de, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 07:46:09.059 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( SyncConnected )
2025/10/13 07:46:09.059 INFO [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState: SyncConnected, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 07:46:21.387 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( Disconnected )
2025/10/13 07:46:21.387 WARN [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 10000184ff502de, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 07:46:22.025 WARN [ZKHelixManager] [message-count-scheduler-0] zkClient to pinot-zookeeper:2181 is not connected, wait for 10000ms.
2025/10/13 07:46:32.028 ERROR [ZKHelixManager] [message-count-scheduler-0] zkClient is not connected after waiting 10000ms., clusterName: d3-pinot-cluster, zkAddress: pinot-zookeeper:2181
2025/10/13 07:46:34.790 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( SyncConnected )
2025/10/13 07:46:34.790 INFO [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState: SyncConnected, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 12:34:34.225 INFO [CallbackHandler] [ZkClient-EventThread-125-pinot-zookeeper:2181] 125 START: CallbackHandler 0, INVOKE /d3-pinot-cluster/INSTANCES/Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@1b9d313c type: CALLBACK
2025/10/13 12:34:34.226 INFO [CallbackHandler] [ZkClient-EventThread-125-pinot-zookeeper:2181] CallbackHandler 0 subscribing changes listener to path: /d3-pinot-cluster/INSTANCES/Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098/MESSAGES, callback type: CALLBACK, event types: [NodeChildrenChanged], listener: org.apache.helix.messaging.handling.HelixTaskExecutor@1b9d313c, watchChild: false
2025/10/13 12:34:34.227 INFO [CallbackHandler] [ZkClient-EventThread-125-pinot-zookeeper:2181] CallbackHandler0, Subscribing to path: /d3-pinot-cluster/INSTANCES/Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098/MESSAGES took: 1
2025/10/13 12:34:34.231 INFO [MessageLatencyMonitor] [ZkClient-EventThread-125-pinot-zookeeper:2181] The latency of message 89f57203-2271-4d7a-abc3-1087222fc439 is 853 ms
2025/10/13 12:34:34.246 INFO [HelixTaskExecutor] [ZkClient-EventThread-125-pinot-zookeeper:2181] Scheduling message 89f57203-2271-4d7a-abc3-1087222fc439: metric_numerical_agg_1H_REALTIME:, null->null

Андрей Морозов

10/14/2025, 6:53 AM

Hi, all ! I trying to batch ingestion from multiple parquet files from directory. Job made all segments in mounted directory , but didn't push it to Pinot. Before this - my table had already one old single segment from previous job and data was pushed successfull. My configuration of cluster - docker [controller, broker, server1, server2. server3] 16CPU / 64RAM / 1TB SSD / Ubuntu Server. Job Spec:

Copy code

executionFrameworkSpec:
  name: standalone
  segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
  segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
  segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner

jobType: SegmentCreationAndTarPush

inputDirURI: '/var/imports/insights_ch1_fff_seg/'
includeFileNamePattern: "glob:**/*.parquet"
outputDirURI: '/tmp/pinot-segments/insights_ch1_fff_sm'
overwriteOutput: true

pushJobSpec:
  pushFileNamePattern: 'glob:**/*.tar.gz'
  pushParallelism: 2
  pushAttempts: 2

recordReaderSpec:
  dataFormat: parquet
  className: org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader

pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS

tableSpec:
  tableName: insights_ch1_4
  schemaURI: '<http://pinot-controller:9000/tables/insights_ch1_4/schema>'
  tableConfigURI: '<http://pinot-controller:9000/tables/insights_ch1_4>'

pinotClusterSpecs:
  - controllerURI: '<http://pinot-controller:9000>'

Made segs on mounted dir after working of job: (screenshot) Command for running job:

Copy code

docker exec -e JAVA_OPTS="-Xms16g -Xmx40g" -it pinot-controller \
  bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /config/insights_ch1_4_job.yaml

I'm not see a log from stdout - only when it falls. Xmx40g (when it was 24g - job failed by out of heap space). What is wrong ?

madhulika

10/14/2025, 4:07 PM

Hi @Mayank I was changing table configuration from replica config instance assignment to balanced segment strategy and noticed the segment count did not change much but table size got doubled.

Sonit Rathi

10/15/2025, 4:37 AM

Hi team, I am trying to remove sort index on one of the columns and have tried reloading all segments. still the segments after reloading still show sorting true and is appearing in queries.

madhulika

10/15/2025, 3:28 PM

Hi @Mayank Event with balanced segment strategy some tables segment being assigned to fewer servers only. I was thinking all servers would participate in segment assignment as round robin.

10/16/2025, 9:00 AM

Hi team, I'm running a real-time table with Kafka ingestion, and although data ingestion is working perfectly fine and the table status is green, I am getting a recurring stream of WARN logs in the Controller that I'd like to clarify. It appears the underlying Kafka client's

ConsumerConfig

is flagging Pinot-specific properties as unknown, likely because they are wrappers around the core Kafka properties. Are these warnings benign and expected, or does this indicate a potential issue with our configuration style? I'm seeking recommendations on whether we can suppress these warnings or if there's an updated configuration pattern we should use to avoid passing these metadata properties to the Kafka client. 1. Controller WARN Logs (Example)

Copy code

2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.decoder.class.name' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'streamType' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.consumer.type' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.broker.list' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.consumer.factory.class.name' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.topic.name' was supplied but isn't a known config.

2. Relevant Table Config (

streamConfigs

)

Copy code

{
  "REALTIME": {
    "tableName": "XYZ",
    "tableType": "REALTIME",
    "segmentsConfig": {...},
    "tenants": {...},
    "tableIndexConfig": {
      "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "LowLevel",
      "stream.kafka.topic.name": "test.airlineStats",
      "stream.kafka.broker.list": "kafka-bootstrap.kafka.svc:9093",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka30.KafkaConsumerFactory",
      "security.protocol": "SSL",
      // SSL config continues...
    },
    "other-configs": ...
   },
   "metadata": {},
   "other-configs": ...
  }
}

Any guidance on best practices for stream config in recent Pinot versions, or a way to silence these specific

ConsumerConfig

warnings, would be highly appreciated! Thanks!

Tommaso Peresson

10/16/2025, 10:55 AM

Is there a cluster config to periodically clean up the task history to avoid bogging down ZK? I know there's an API, just wanted to know if it could be self contained without having to schedule an job external to pinot to call it.

Андрей Морозов

10/17/2025, 11:43 AM

Hi, Team ! I have a problem with ingestion from CSV file, which contains STRING values in column, such a "#1082;аБ...." I get ERROR Caused by: java.lang.IllegalArgumentException: Cannot read single-value from Object[]: [Б, а, р,......] for column: ext_id The parser reading this as array, but I want to load this to Pinot as is as STRING. How to fix this ? Another problem with parsing STRING as " Text , text text", parser reasing it as Object[]

Mustafa Shams

10/20/2025, 7:02 PM

I'm having an issue with the UI in pinot 1.4.0 when trying to add an Offline or Realtime table where sometimes the Table Type option will be unselected and grayed out so I'm not able to select it. I have to switch to the json editor and enter the table type for it to work. I was wondering if this is a known issue or a bug with 1.4.0. Is there a way to fix it or a version where this doesn't happen?