Rishika
07/11/2025, 5:29 PMLuis P Fernandes
07/14/2025, 3:50 PM{
"tableName": "tiered",
"tableType": "REALTIME",
"segmentsConfig": {
"minimizeDataMovement": false,
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"replicasPerPartition": "1",
"schemaName": "tiered",
"replication": "2"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {}
},
"tableIndexConfig": {
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"loadMode": "MMAP",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "tiered",
"stream.kafka.broker.list": "localhost:19092",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.segment.rows": "0",
"realtime.segment.flush.threshold.time": "1m",
"realtime.segment.flush.threshold.segment.size": "100M"
},
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false,
"columnMajorSegmentBuilderEnabled": true,
"optimizeDictionary": false,
"optimizeDictionaryForMetrics": false,
"optimizeDictionaryType": false,
"noDictionarySizeRatioThreshold": 0.85,
"rangeIndexVersion": 2,
"invertedIndexColumns": [],
"noDictionaryColumns": [],
"bloomFilterColumns": [],
"onHeapDictionaryColumns": [],
"rangeIndexColumns": [],
"sortedColumn": [],
"varLengthDictionaryColumns": []
},
"quota": {},
"query": {},
"ingestionConfig": {
"continueOnError": false,
"rowTimeValueCheck": false,
"segmentTimeValueCheck": true
},
"tierConfigs": [
{
"name": "hotTier",
"segmentSelectorType": "time",
"segmentAge": "1m",
"storageType": "pinot_server",
"serverTag": "DefaultTenant_OFFLINE"
},
{
"name": "coldTier",
"segmentSelectorType": "time",
"segmentAge": "10m",
"storageType": "pinot_server",
"serverTag": "DefaultTenant_OFFLINE"
}
]
}
Table_Schema: {
"schemaName": "tiered",
"enableColumnBasedNullHandling": true,
"dimensionFieldSpecs": [
{
"name": "product_name",
"dataType": "STRING",
"notNull": true
}
],
"metricFieldSpecs": [
{
"name": "price",
"dataType": "LONG",
"notNull": false
}
],
"dateTimeFieldSpecs": [
{
"name": "timestamp",
"dataType": "TIMESTAMP",
"format": "1MILLISECONDSEPOCH",
"granularity": "1:MILLISECONDS"
}
]
}Felipe
07/16/2025, 9:48 AM[PerQueryCPUMemAccountantFactory$PerQueryCPUMemResourceUsageAccountant] [CPUMemThreadAccountant] Heap used bytes 6301800816 exceeds critical level 6184752768are there any configuration that I can increase the heap size, or this shouldn't be happening at all??
Felipe
07/16/2025, 9:49 AMMonika reddy
07/16/2025, 5:25 PMKiril Kalchev
07/16/2025, 7:41 PMYeshwanth
07/17/2025, 7:30 AMError occurred during initialization of VM
agent library failed to init: instrument
Error opening zip file or JAR manifest missing : /opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent.jar
I can see a similar issue was reported here - https://github.com/apache/pinot/issues/16283
I don't think the fix was applied to this tag -> https://hub.docker.com/layers/apachepinot/pinot/1.3.0/images/sha256-27d64d558cd8a90efdf2c15d92dfd713b173120606942fd6faef9b19d20ec2dd
Can someone pls look into this ?Ricardo Machado
07/17/2025, 3:41 PMAn error occurred while calling o4276.count. : org.apache.pinot.connector.spark.common.PinotException: An error occurred while getting routing table for query, '<REDACTED' at org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTableForQuery(PinotClusterClient.scala:208) at org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTable(PinotClusterClient.scala:153) at org.apache.pinot.connector.spark.v3.datasource.PinotScan.planInputPartitions(PinotScan.scala:57) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions$lzycompute(BatchScanExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions(BatchScanExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:179) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:175) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:39) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:658) at org.apache.spark.sql.execution.QueryExecution.$anonfun$getSparkPlan$1(QueryExecution.scala:195) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:714) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276) at org.apache.spark.sql.execution.QueryExecution.getSparkPlan(QueryExecution.scala:195) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:187) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:187) at org.apache.spark.sql.execution.QueryExecution.$anonfun$getExecutedPlan$1(QueryExecution.scala:211) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:714) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276) at org.apache.spark.sql.execution.QueryExecution.getExecutedPlan(QueryExecution.scala:208) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:203) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:203) at org.apache.spark.sql.execution.QueryExecution.$anonfun$writeProcessedPlans$10(QueryExecution.scala:417) at org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:747) at org.apache.spark.sql.execution.QueryExecution.writeProcessedPlans(QueryExecution.scala:417) at org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:393) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:432) at <http://org.apache.spark.sql.execution.QueryExecution.org|org.apache.spark.sql.execution.QueryExecution.org>$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:333) at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:311) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:146) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4390) at org.apache.spark.sql.Dataset.count(Dataset.scala:3661) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: org.apache.pinot.connector.spark.common.HttpStatusCodeException: Got error status code '400' with reason 'Bad Request' at org.apache.pinot.connector.spark.common.HttpUtils$.executeRequest(HttpUtils.scala:66) at org.apache.pinot.connector.spark.common.HttpUtils$.sendGetRequest(HttpUtils.scala:50) at org.apache.pinot.connector.spark.common.PinotClusterClient$.$anonfun$getRoutingTableForQuery$1(PinotClusterClient.scala:199) at scala.util.Try$.apply(Try.scala:213) at org.apache.pinot.connector.spark.common.PinotClusterClient$.getRoutingTableForQuery(PinotClusterClient.scala:196)
Victor Bivolaru
07/18/2025, 1:40 PMKiril Kalchev
07/18/2025, 9:10 PM"auctionsStats__6__13__20250703T1233Z": {
"Server_pinot-prod-server-0.pinot-prod-server-headless.pinot.svc.cluster.local_8098": "OFFLINE",
"Server_pinot-prod-server-1.pinot-prod-server-headless.pinot.svc.cluster.local_8098": "OFFLINE",
"Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098": "OFFLINE"
},
When I try to download the segments again, I get an error saying they are not in my deepstore. However, queries seem to work normally.
Is it expected for segments to be reported as offline and missing in deepstore? What exactly does offline mean as a segment status?
Bellow are the latest messages for the above segment:
INFO 2025-07-18T05:35:05.820609035Z [resource.labels.containerName: server] 2025/07/18 05:35:05.820 INFO [HttpClient] [auctionsStats__6__13__20250703T1233Z] Sending request: <http://pinot-prod-controller-1.pinot-prod-controller-headless.pinot.svc.cluster.local:9000/segmentStoppedConsuming?reason=org.apache.pinot.shaded.org.apache.kafka.common.KafkaException&streamPartitionMsgOffset=0&instance=Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098&offset=-1&name=auctionsStats__6__13__20250703T1233Z> to controller: pinot-prod-controller-1.pinot-prod-controller-headless.pinot.svc.cluster.local, version: Unknown
INFO 2025-07-18T05:35:05.821542868Z [resource.labels.containerName: server] 2025/07/18 05:35:05.821 INFO [ServerSegmentCompletionProtocolHandler] [auctionsStats__6__13__20250703T1233Z] Controller response {"status":"PROCESSED","streamPartitionMsgOffset":null,"isSplitCommitType":true,"buildTimeSec":-1} for <http://pinot-prod-controller-1.pinot-prod-controller-headless.pinot.svc.cluster.local:9000/segmentStoppedConsuming?reason=org.apache.pinot.shaded.org.apache.kafka.common.KafkaException&streamPartitionMsgOffset=0&instance=Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098&offset=-1&name=auctionsStats__6__13__20250703T1233Z>
INFO 2025-07-18T05:35:05.821571462Z [resource.labels.containerName: server] 2025/07/18 05:35:05.821 INFO [RealtimeSegmentDataManager_auctionsStats__6__13__20250703T1233Z] [auctionsStats__6__13__20250703T1233Z] Got response {"status":"PROCESSED","streamPartitionMsgOffset":null,"isSplitCommitType":true,"buildTimeSec":-1}
INFO 2025-07-18T05:35:05.983729827Z [resource.labels.containerName: server] 2025/07/18 05:35:05.976 INFO [local_8098 - SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread_7] SegmentOnlineOfflineStateModel.onBecomeOfflineFromConsuming() : ZnRecord=cc787368-9a93-42f3-8588-ebefe88f2a07, {CREATE_TIMESTAMP=1752816905933, ClusterEventName=IdealStateChange, EXECUTE_START_TIMESTAMP=1752816905976, EXE_SESSION_ID=300627ec087008e, FROM_STATE=CONSUMING, MSG_ID=cc787368-9a93-42f3-8588-ebefe88f2a07, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=auctionsStats__6__13__20250703T1233Z, READ_TIMESTAMP=1752816905959, RESOURCE_NAME=auctionsStats_REALTIME, RESOURCE_TAG=auctionsStats_REALTIME, RETRY_COUNT=3, SRC_NAME=pinot-prod-controller-2.pinot-prod-controller-headless.pinot.svc.cluster.local_9000, SRC_SESSION_ID=2006281fc800087, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_pinot-prod-server-2.pinot-prod-server-headless.pinot.svc.cluster.local_8098, TGT_SESSION_ID=300627ec087008e, TO_STATE=OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1752816905946, _modifiedTime=1752816905946, _ephemeralOwner=0}
INFO 2025-07-18T05:35:05.984995178Z [resource.labels.containerName: server] 2025/07/18 05:35:05.983 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread_7] Removing segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
INFO 2025-07-18T05:35:05.985038958Z [resource.labels.containerName: server] 2025/07/18 05:35:05.983 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Removing segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
INFO 2025-07-18T05:35:05.985045952Z [resource.labels.containerName: server] 2025/07/18 05:35:05.983 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Closing segment: auctionsStats__6__13__20250703T1233Z of table: auctionsStats_REALTIME
INFO 2025-07-18T05:35:05.985110098Z [resource.labels.containerName: server] 2025/07/18 05:35:05.984 INFO [MutableSegmentImpl_auctionsStats__6__13__20250703T1233Z_auctionsStats] [HelixTaskExecutor-message_handle_thread_7] Trying to close RealtimeSegmentImpl : auctionsStats__6__13__20250703T1233Z
INFO 2025-07-18T05:35:05.985117081Z [resource.labels.containerName: server] 2025/07/18 05:35:05.984 INFO [auctionsStats_REALTIME-6-ConcurrentMapPartitionUpsertMetadataManager] [HelixTaskExecutor-message_handle_thread_7] Skip removing untracked (replaced or empty) segment: auctionsStats__6__13__20250703T1233Z
INFO 2025-07-18T05:35:05.987557288Z [resource.labels.containerName: server] 2025/07/18 05:35:05.987 INFO [MmapMemoryManager] [HelixTaskExecutor-message_handle_thread_7] Deleted file /var/pinot/server/data/index/auctionsStats_REALTIME/consumers/auctionsStats__6__13__20250703T1233Z.0
INFO 2025-07-18T05:35:05.990545309Z [resource.labels.containerName: server] 2025/07/18 05:35:05.990 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Closed segment: auctionsStats__6__13__20250703T1233Z of table: auctionsStats_REALTIME
INFO 2025-07-18T05:35:05.990570191Z [resource.labels.containerName: server] 2025/07/18 05:35:05.990 INFO [auctionsStats_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_7] Removed segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
INFO 2025-07-18T05:35:05.990578459Z [resource.labels.containerName: server] 2025/07/18 05:35:05.990 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread_7] Removed segment: auctionsStats__6__13__20250703T1233Z from table: auctionsStats_REALTIME
INFO 2025-07-18T06:15:57.880369560Z [resource.labels.containerName: controller] 2025/07/18 06:15:57.880 INFO [PinotLLCRealtimeSegmentManager] [pool-10-thread-7] Repairing segment: auctionsStats__6__13__20250703T1233Z which is OFFLINE for all instances in IdealState
madhulika
07/21/2025, 4:19 PMmadhulika
07/21/2025, 4:21 PMKrupa
07/24/2025, 5:20 PMrobert zych
07/24/2025, 10:14 PMSegmentGenerationAndPushTask
appears to hang when configured to a s3 bucket with many (>80K) files. Besides reducing the number of files in the bucket, what can be done to handle buckets with many files? @XiaobingMayank
Monika reddy
07/28/2025, 4:12 PMJonathan Baxter
07/28/2025, 9:05 PMKerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
Venkat Sai Ram
07/30/2025, 9:00 AMSELECT
user_id,
event_name,
geo
FROM events_intraday_20250719
WHERE JSON_MATCH(
geo,
'"$.country"=''India'''
)
LIMIT 10;
got no records found.
geo column :
"{'city': 'Delhi', 'country': 'India', 'continent': 'Asia', 'region': 'Delhi', 'sub_continent': 'Southern Asia', 'metro': '(not set)'}"
SELECT
json_extract_scalar(app_info, '$.id', 'STRING', 'null') AS app_id,
json_extract_scalar(app_info, '$.version', 'STRING', 'null') AS app_version
FROM events_intraday_20250719
LIMIT 10;
all null's is the result i got.
app_info column :
"{'id': 'com.aadhan.hixic', 'version': '5.7.6', 'install_store': None, 'firebase_app_id': '1:700940617518:android:4c5cd93d642b6868', 'install_source': 'manual_install'}"
if I remove the default null, i got
Error Code: 200 (QueryExecutionError)
Caught exception while doing operator: class org.apache.pinot.core.operator.query.SelectionOnlyOperator on segment events_intraday_20250719_OFFLINE_1752863401870004_1752949768919000_12: Cannot resolve JSON path on some records. Consider setting a default value.
Table Config
{
"OFFLINE": {
"tableName": "events_intraday_20250719_OFFLINE",
"tableType": "OFFLINE",
"segmentsConfig": {
"replication": "1",
"timeColumnName": "event_timestamp",
"minimizeDataMovement": false
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"aggregateMetrics": false,
"invertedIndexColumns": [
"event_name",
"platform",
"session_traffic_source_last_click",
"traffic_source"
],
"nullHandlingEnabled": false,
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"columnMajorSegmentBuilderEnabled": true,
"skipSegmentPreprocess": false,
"optimizeDictionary": false,
"optimizeDictionaryForMetrics": false,
"optimizeDictionaryType": false,
"noDictionarySizeRatioThreshold": 0.85,
"rangeIndexVersion": 2,
"jsonIndexColumns": [
"app_info",
"device",
"event_params",
"geo",
"privacy_info",
"user_properties"
],
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"loadMode": "MMAP"
},
"metadata": {},
"ingestionConfig": {
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY",
"consistentDataPush": false
},
"continueOnError": false,
"retryOnSegmentBuildPrecheckFailure": false,
"rowTimeValueCheck": false,
"segmentTimeValueCheck": true
},
"isDimTable": false
}
}
can you help me with this. im happy to provide addtional information.Kavya
07/30/2025, 12:57 PMNicolas Thiessen
07/30/2025, 1:56 PMEmerson Lesage
07/30/2025, 3:48 PMPraneeth G
07/31/2025, 6:03 AMmetadataTTL
in upsertConfig ? According to documentation example it is seconds but it is also mentioned
Since the metadata TTL is applied on the first comparison column, the time unit of upsert TTL is the same as the first comparison column.
With below conf , ingestion is failing .
"segmentsConfig": {
"schemaName": "agent_task_dimension_v1",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "180",
"replicasPerPartition": "2",
"timeType": "DAYS",
"timeColumnName": "task_created_date" .... }
"upsertConfig": {
"mode": "FULL",
"metadataTTL": 648000
.. }
java.lang.ClassCastException: null
pinot-server.log:2025/07/30 23:53:32.481 ERROR [RealtimeSegmentDataManager_agent_task_dimension_v1__16__0__20250730T1823Z] [agent_task_dimension_v1__16__0__20250730T1823Z] Caught exception while indexing the record at offset: 52
I tried bunch of combinations and it is not due to null fields, seems to be due to timeUnit and metadataTTL combination mismatch.Rajat
07/31/2025, 6:58 AMfrancoisa
07/31/2025, 3:16 PMNew uploaded LLC segment must have start/end offset in the segment metadata
even by cheating the metada file with
segment.total.docs = 2
segment.kafka.topic.name : ressources
segment.kafka.partition.id : 0
segment.start.offset : 0
segment.end.offset : 2
and repacking facing the same issue any idea ? 😇 Or nothing will works like that (I’m maybe a dreamer 😄 )
Thx by the way for the amazing work 😉madhulika
08/05/2025, 2:00 PMApoorv Upadhyay
08/06/2025, 9:33 AMforceCommit
segments are not getting committed to deep-store.
I could see log line _segmentLogger.error("Could not build segment for {}", _segmentNameStr);
but this is also not coming further when i re-onboared, also there are no errors logs related to creating or pushing segments.
attaching table config and schema config.
please suggest how can i debug this furtherRajat
08/06/2025, 12:13 PMEmerson Lesage
08/06/2025, 2:15 PMstartReplaceSegments
and endReplaceSegments
endpoints the correct and most robust approach for doing atomic transactions for offline tables?
For example, if my table currently contains segments s1, s2, and s3, and I need to do the following in an all or nothing transaction:
• Insert new segments s4 and s5.
• Replace existing segment s3 with a new segment s6.Rohini Choudhary
08/07/2025, 5:45 AMShrusti Patel
08/07/2025, 4:53 PM