Андрей Морозов
10/08/2025, 10:59 AMTommaso Peresson
10/08/2025, 3:26 PMmetadata push mode to save the segments in the deep store and keep only metadata on the Controller?Shubham Kumar
10/09/2025, 6:21 AMmadhulika
10/09/2025, 5:18 PMSELECT tripId,
CASE
WHEN total_task = delivered_task THEN 'COMPLETE_DELIVERY'
WHEN total_task = delivered_task + returned_task THEN 'DELIVERED_RETURNED'
WHEN delivered_task = '0' THEN 'NO_DELIVERY'
ELSE 'PARTIAL_DELIVERY'
END AS delivery_Type
FROM (
SELECT DISTINCT tripId,
COUNT(DISTINCT taskId) AS total_task,
SUM(deliveredOrder) AS delivered_task,
SUM(returnedOrder) AS returned_task
FROM (
SELECT tripId,
CASE
WHEN deliveryStatus IN ('DELIVERED') THEN 1
ELSE 0
END AS deliveredOrder,
CASE
WHEN deliveryStatus IN ('RETURNED') THEN 1
ELSE 0
END AS returnedOrder,
taskId
FROM lmd_task_db_snapshot task
WHERE tripId IN (
) AND scheduleStart >= '2025-10-08 13:00:00.0'
AND scheduleStart < '2025-10-10 08:00:00.0'
) I
GROUP BY tripId
)Victor Bivolaru
10/10/2025, 11:09 AM"Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4": "IN_PROGRESS" when calling GET /tasks/SegmentGenerationAndPushTask/state, but when checking with GET tasks/subtask/Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4/state I get
{
"Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4_0": null
}
The controller logs clearly states
2025/10/10 11:04:55.086 ERROR [JobDispatcher] [HelixController-pipeline-task-smth-(2c58d6d3_TASK)] No available instance found for job: TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_smth_f11b81f0-cc0f-4c8d-b205-4873963f49d4
I was expecting that the status of the task to also reflect that by showing "NOT_STARTED".Victor Bivolaru
10/10/2025, 1:46 PMC1 that in the table config appears as a sortedColumn
Nightly we would like to run a merge task but I am not sure if this task would keep the data sorted over the newly created segment. I am afraid the only way is writing a custom taskfrancoisa
10/10/2025, 2:05 PMraghav
10/10/2025, 2:09 PMpinotServer.2025-10-10.9.log.gz:2025/10/10 13:22:34.067 ERROR [RealtimeSegmentDataManager_metric_numerical_agg_1H__16__182629__20251010T1321Z] [metric_numerical_agg_1H__16__182629__20251010T1321Z] Holding after response from Controller: {"buildTimeSec":-1,"isSplitCommitType":true,"streamPartitionMsgOffset":null,"status":"NOT_SENT"}
pinotServer.2025-10-10.9.log.gz:2025/10/10 13:22:52.653 ERROR [ServerSegmentCompletionProtocolHandler] [metric_numerical_agg_1H__28__180921__20251010T1322Z] Could not send request <http://pinot-controller-0.pinot-controller-headless.d3-pinot-cluster.svc.cluster.local:9000/segmentConsumed?reason=rowLimit&streamPartitionMsgOffset=172544503662&instance=Server_pinot-server-1.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098&name=metric_numerical_agg_1H__28__180921__20251010T1322Z&rowCount=696146&memoryUsedBytes=338498296>
2025/10/10 13:14:41.871 WARN [AppInfoParser] [HelixTaskExecutor-message_handle_thread_5] Error registering AppInfo mbean
javax.management.InstanceAlreadyExistsException: kafka.consumer:type=app-info,id=metric_numerical_agg_1H_REALTIME-D3NumericalSketchPartitioned-28
at java.management/com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:322)Yash Lohade
10/10/2025, 4:46 PMSatya Mahesh
10/13/2025, 1:40 PMraghav
10/13/2025, 3:19 PM2025/10/13 07:46:07.467 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( Disconnected )
2025/10/13 07:46:07.472 WARN [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 10000184ff502de, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 07:46:09.059 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( SyncConnected )
2025/10/13 07:46:09.059 INFO [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState: SyncConnected, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 07:46:21.387 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( Disconnected )
2025/10/13 07:46:21.387 WARN [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 10000184ff502de, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 07:46:22.025 WARN [ZKHelixManager] [message-count-scheduler-0] zkClient to pinot-zookeeper:2181 is not connected, wait for 10000ms.
2025/10/13 07:46:32.028 ERROR [ZKHelixManager] [message-count-scheduler-0] zkClient is not connected after waiting 10000ms., clusterName: d3-pinot-cluster, zkAddress: pinot-zookeeper:2181
2025/10/13 07:46:34.790 INFO [ZkClient] [Start a Pinot [SERVER]-EventThread] zkclient 3, zookeeper state changed ( SyncConnected )
2025/10/13 07:46:34.790 INFO [ZKHelixManager] [ZkClient-EventThread-125-pinot-zookeeper:2181] KeeperState: SyncConnected, instance: Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098, type: PARTICIPANT
2025/10/13 12:34:34.225 INFO [CallbackHandler] [ZkClient-EventThread-125-pinot-zookeeper:2181] 125 START: CallbackHandler 0, INVOKE /d3-pinot-cluster/INSTANCES/Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@1b9d313c type: CALLBACK
2025/10/13 12:34:34.226 INFO [CallbackHandler] [ZkClient-EventThread-125-pinot-zookeeper:2181] CallbackHandler 0 subscribing changes listener to path: /d3-pinot-cluster/INSTANCES/Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098/MESSAGES, callback type: CALLBACK, event types: [NodeChildrenChanged], listener: org.apache.helix.messaging.handling.HelixTaskExecutor@1b9d313c, watchChild: false
2025/10/13 12:34:34.227 INFO [CallbackHandler] [ZkClient-EventThread-125-pinot-zookeeper:2181] CallbackHandler0, Subscribing to path: /d3-pinot-cluster/INSTANCES/Server_pinot-server-4.pinot-server-headless.d3-pinot-cluster.svc.cluster.local_8098/MESSAGES took: 1
2025/10/13 12:34:34.231 INFO [MessageLatencyMonitor] [ZkClient-EventThread-125-pinot-zookeeper:2181] The latency of message 89f57203-2271-4d7a-abc3-1087222fc439 is 853 ms
2025/10/13 12:34:34.246 INFO [HelixTaskExecutor] [ZkClient-EventThread-125-pinot-zookeeper:2181] Scheduling message 89f57203-2271-4d7a-abc3-1087222fc439: metric_numerical_agg_1H_REALTIME:, null->nullАндрей Морозов
10/14/2025, 6:53 AMexecutionFrameworkSpec:
name: standalone
segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner
jobType: SegmentCreationAndTarPush
inputDirURI: '/var/imports/insights_ch1_fff_seg/'
includeFileNamePattern: "glob:**/*.parquet"
outputDirURI: '/tmp/pinot-segments/insights_ch1_fff_sm'
overwriteOutput: true
pushJobSpec:
pushFileNamePattern: 'glob:**/*.tar.gz'
pushParallelism: 2
pushAttempts: 2
recordReaderSpec:
dataFormat: parquet
className: org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
tableSpec:
tableName: insights_ch1_4
schemaURI: '<http://pinot-controller:9000/tables/insights_ch1_4/schema>'
tableConfigURI: '<http://pinot-controller:9000/tables/insights_ch1_4>'
pinotClusterSpecs:
- controllerURI: '<http://pinot-controller:9000>'
Made segs on mounted dir after working of job:
(screenshot)
Command for running job:
docker exec -e JAVA_OPTS="-Xms16g -Xmx40g" -it pinot-controller \
bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /config/insights_ch1_4_job.yaml
I'm not see a log from stdout - only when it falls.
Xmx40g (when it was 24g - job failed by out of heap space).
What is wrong ?madhulika
10/14/2025, 4:07 PMSonit Rathi
10/15/2025, 4:37 AMmadhulika
10/15/2025, 3:28 PMmg
10/16/2025, 9:00 AMConsumerConfig is flagging Pinot-specific properties as unknown, likely because they are wrappers around the core Kafka properties.
Are these warnings benign and expected, or does this indicate a potential issue with our configuration style?
I'm seeking recommendations on whether we can suppress these warnings or if there's an updated configuration pattern we should use to avoid passing these metadata properties to the Kafka client.
1. Controller WARN Logs (Example)
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.decoder.class.name' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'streamType' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.consumer.type' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.broker.list' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.consumer.factory.class.name' was supplied but isn't a known config.
2025/10/16 08:20:15.667 WARN [ConsumerConfig] [pool-14-thread-9] The configuration 'stream.kafka.topic.name' was supplied but isn't a known config.
2. Relevant Table Config (streamConfigs)
{
"REALTIME": {
"tableName": "XYZ",
"tableType": "REALTIME",
"segmentsConfig": {...},
"tenants": {...},
"tableIndexConfig": {
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "LowLevel",
"stream.kafka.topic.name": "test.airlineStats",
"stream.kafka.broker.list": "kafka-bootstrap.kafka.svc:9093",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka30.KafkaConsumerFactory",
"security.protocol": "SSL",
// SSL config continues...
},
"other-configs": ...
},
"metadata": {},
"other-configs": ...
}
}
Any guidance on best practices for stream config in recent Pinot versions, or a way to silence these specific ConsumerConfig warnings, would be highly appreciated!
Thanks!Tommaso Peresson
10/16/2025, 10:55 AMАндрей Морозов
10/17/2025, 11:43 AMMustafa Shams
10/20/2025, 7:02 PMAlaa Halawani
10/22/2025, 5:47 AMschedulerWaitMs
Additional details:
• Ingestion is stopped (so no extra Kafka load)
• Increasing pinot.query.scheduler.query_runner_threads helped slightly, but performance is still slower than before the restart
• Tried both MMAP and HEAP loading modes with similar results
• I am running Pinot cluster on k8s nodes
Has anyone run into similar behavior after a restart? Any idea why it happens?
Any recommendations or configuration tips to improve performance would be much appreciatedRahul Sharma
10/22/2025, 7:56 PMupsertCompactionTask is visible, but its task configuration is empty. As a result, compaction is not working, and the number of records in my table remains the same.
Can anyone please help?
Conf:
"task": {
"taskTypeConfigsMap": {
"UpsertCompactionTask": {
"schedule": "0 */5 * ? * *",
"bufferTimePeriod": "0d",
"invalidRecordsThresholdPercent": "10",
"invalidRecordsThresholdCount": "1000"
}
}
},Krupa
10/24/2025, 11:19 AMUtsav Jain
10/29/2025, 5:15 AMRajat
10/29/2025, 10:16 AMSELECT s_id, count(*)
FROM shipmentMerged_final
GROUP BY s_id
HAVING COUNT(*) > 1
Sometimes it shows no records but sometimes it shows data with count as 2Rajat
10/29/2025, 10:49 AMSELECT COUNT(*) AS aggregate,
s_id
FROM shipmentMerged_final
WHERE o_company_id = 2449226
AND o_created_at BETWEEN TIMESTAMP '2025-10-10 00:00:00' AND TIMESTAMP '2025-10-26 23:59:59'
AND o_shipping_method IN ('SR', 'SRE', 'AC')
AND o_is_return = 0
AND o_state = 0
group by 2
limit 1500
Above Query is showing:
1150 total records
But When running:
SELECT COUNT(*) AS aggregate
FROM shipmentMerged_final
WHERE o_company_id = 2449226
AND o_created_at BETWEEN TIMESTAMP '2025-10-10 00:00:00' AND TIMESTAMP '2025-10-26 23:59:59'
AND o_shipping_method IN ('SR', 'SRE', 'AC')
AND o_is_return = 0
AND o_state = 0
The count is coming as:
1162Rajat
10/29/2025, 10:49 AMRashpal Singh
10/29/2025, 11:24 PMnullHandlingEnabled=true at table config level
enableColumnBasedNullHandling": true at schema level
{
"name": "notNullColumn",
"dataType": "DOUBLE",
"notNull": False
}
Still when I am querying, I am getting "0" instead of null.
How can I fix this issue where I want to see null (original value) instead of 0 in query response without adding "SET enableNullHandling=true" in my queryRahul Sharma
10/30/2025, 4:23 AMpinot_controller_numMinionSubtasksWaiting_Value and pinot_controller_numMinionSubtasksRunning_Value. However, for each task type, they always show a value of 0 even when tasks are running. Am I using the wrong metrics? Which metrics should I use to build a custom autoscaler for minions?francoisa
10/30/2025, 8:49 AMBadhusha Muhammed
10/30/2025, 4:17 PM