Kartik Khare
07/24/2025, 11:56 AMSan Kumar
07/25/2025, 3:37 AMVeerendra
07/25/2025, 9:10 AMEvan Galpin
07/25/2025, 5:41 PMSan Kumar
07/27/2025, 11:52 AMexecutionFrameworkSpec:
name: 'standalone'
what is the maximum csv file size it can upload to pinot offline table.
can we upload 100gb file to offline table using Standaloone modeJacek Skrzypacz
07/28/2025, 12:26 PMMuller Liu
07/28/2025, 5:25 PMSan Kumar
07/29/2025, 5:46 AMShubham Kumar
07/30/2025, 11:49 AMis_deleted
(configured as deleteRecordColumn
) as true
for soft deletion of 5K rows.
However, after the deleteRecordTTL
period, the primary keys are not being deleted. I can see that the documents are unqueryable, but they are still present in the primary key map.
Is there a scheduler that handles this cleanup?
Does it run automatically, or do we need to trigger it manually?
Can we configure the frequency at which it runs?박민지
07/30/2025, 11:55 AM2025/07/30 13:27:36.981 ERROR [RealtimeSegmentDataManager_order_events_test__10001__0__20250730T1123Z] [order_events_test__10001__0__20250730T1123Z] Exception while in work
org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:823) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:665) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:646) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:626) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.lambda$createConsumer$0(KafkaPartitionLevelConnectionHandler.java:86) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.retry(KafkaPartitionLevelConnectionHandler.java:100) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.createConsumer(KafkaPartitionLevelConnectionHandler.java:86) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConnectionHandler.<init>(KafkaPartitionLevelConnectionHandler.java:67) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.plugin.stream.kafka20.KafkaPartitionLevelConsumer.<init>(KafkaPartitionLevelConsumer.java:52) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory.createPartitionGroupConsumer(KafkaConsumerFactory.java:51) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.recreateStreamConsumer(RealtimeSegmentDataManager.java:1830) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.consumeLoop(RealtimeSegmentDataManager.java:529) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager$PartitionConsumer.run(RealtimeSegmentDataManager.java:765) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:89) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:48) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:731) ~[pinot-all-1.3.0-jar-with-dependencies.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
... 13 more
Since I was able to ingest data from the first topic, I don't think the broker URL is the issue. Is there anything I might be missing??
I set table config like this:
"streamConfigMaps": [
{
"streamType": "kafka",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "{BROKER}:9092",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.protobuf.KafkaConfluentSchemaRegistryProtoBufMessageDecoder",
"stream.kafka.decoder.prop.schema.registry.rest.url": "https://{REGISTRY}}",
"stream.kafka.decoder.prop.basic.auth.credentials.source": "USER_INFO",
"<http://stream.kafka.decoder.prop.schema.registry.basic.auth.user.info|stream.kafka.decoder.prop.schema.registry.basic.auth.user.info>": "{USER_INFO}",
"stream.kafka.consumer.type": "LOWLEVEL",
"security.protocol": "SASL_SSL",
"sasl.mechanism": "PLAIN",
"sasl.jaas.config": "{SASL}",
"realtime.segment.flush.threshold.rows": "500000",
"realtime.segment.flush.autotune.initialRows": "500000",
"stream.kafka.topic.name": "{TOPIC_NAME1}"
},
{
"streamType": "kafka",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "{BROKER}:9092",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.protobuf.KafkaConfluentSchemaRegistryProtoBufMessageDecoder",
"stream.kafka.decoder.prop.schema.registry.rest.url": "https://{REGISTRY}}",
"stream.kafka.decoder.prop.basic.auth.credentials.source": "USER_INFO",
"<http://stream.kafka.decoder.prop.schema.registry.basic.auth.user.info|stream.kafka.decoder.prop.schema.registry.basic.auth.user.info>": "{USER_INFO}",
"stream.kafka.consumer.type": "LOWLEVEL",
"security.protocol": "SASL_SSL",
"sasl.mechanism": "PLAIN",
"sasl.jaas.config": "{SASL}",
"realtime.segment.flush.threshold.rows": "500000",
"realtime.segment.flush.autotune.initialRows": "500000",
"stream.kafka.topic.name": "{TOPIC_NAME2}"
}
],
박민지
07/30/2025, 3:06 PMRaghavendra M
07/31/2025, 4:46 AMAman Satya
07/31/2025, 9:46 AMShubham Kumar
07/31/2025, 5:35 PM.txt
?
• columns.psf
• creation.meta
• validdocids.bitmap.snapshot
• ttl.watermark.partition.0
Additionally, I would appreciate it if you could explain the purpose of each of these filesShivam Sharma
08/01/2025, 11:12 AMSan Kumar
08/02/2025, 3:43 AMXiang Fu
San Kumar
08/05/2025, 4:22 PMXiang Fu
Mohemmad Zaid
08/06/2025, 6:30 AMspaces
is multi value column.
{
"dimensionsSplitOrder": [
"pdate"
],
"functionColumnPairs": [
"DISTINCTCOUNTHLLMV__spaces"
]
}
https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java#L1309
IMO, we can avoid this check for aggregation column.Raghavendra M
08/06/2025, 7:52 AMShubham Kumar
08/07/2025, 9:33 AMZaeem Arshad
08/08/2025, 1:01 PMPrathamesh
08/09/2025, 9:52 AMSan Kumar
08/12/2025, 3:25 AMZaeem Arshad
08/12/2025, 3:47 AMArnav
08/12/2025, 4:23 AMarnavshi
08/12/2025, 7:05 AMForbidden: updates to statefulset spec for fields other than \'replicas\', \'ordinals\', \'template\', \'updateStrategy\', \'persistentVolumeClaimRetentionPolicy\' and \'minReadySeconds\' are forbidden\n'
While I understand that this is a Kubernetes issue/limitation, I wanted your guidance on what can be done to resolve this.San Kumar
08/12/2025, 11:09 AMam_developer
08/12/2025, 11:31 AM