Daniel Wunderlich
04/28/2025, 2:19 PMjute.maxbuffer
. So it's about time we reviewed and upgraded our setup, including setting up a merge rollup task to solve the small segments situation.
Main plan:
• Upgrade Pinot to the latest version
◦ Is it best to start a fresh cluster/helm installation and restore segments from S3, or can we just upgrade Helm? Are segments backwards-compatible?
• Use 2+ replicas for each component, and at least 2 K8S nodes for HA.
◦ Any recommendations on instance types? I hear that M/T are generally good.
◦ I read somewhere that it's best to run Zookeeper on its own, and not use the "embedded" Helm chart. Is this correct?
Any other suggestions/pitfalls? Really appreciate the help.Aman Satya
04/29/2025, 9:38 AMVipin Rohilla
04/29/2025, 9:39 AMvinod kumar naidu
04/29/2025, 11:36 AMMannoj
04/29/2025, 11:49 AMSP
04/29/2025, 10:00 PMpinot.controller-urls
and pinot.broker-url
set in the catalog properties.
Problem: Although I’ve pointed the pinot.broker-url
to the active Pinot service (which should resolve to the single active broker), Trino seems to ignore this setting and continues to try connecting to the brokers via the headless service, including the brokers that have been scaled down.
Has anyone encountered a similar issue where Trino continues to use a headless service instead of the defined broker-url? Any advice on ensuring Trino only connects to the active broker after scaling down?
Thanks in advance for your help!
Configuration:
pinot.controller-urls=pinot-controller.pinot.svc.cluster.local:9000
pinot.broker-url=pinot-broker.pinot.svc.cluster.local:8000
Error:
trino> select * from pinot.default.airlinestats;
Query 20250429_210757_00092_aich9 failed: Failed communicating with server: <http://pinot-broker-2.pinot-broker-headless.pinot.svc.cluster.local:8000/debug/routingTable/airlineStats>
trino> select * from pinot.default.airlinestats;
Query 20250429_210805_00094_aich9 failed: Failed communicating with server: <http://pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local:8000/debug/routingTable/airlineStats>
Rajat
04/30/2025, 6:50 AMRajat
04/30/2025, 11:09 AMRajat
04/30/2025, 11:09 AMTsvetan
04/30/2025, 12:39 PM2025/04/30 06:17:46.386 ERROR [ServerSegmentCompletionProtocolHandler] [player_sessions_active_minutes__9__0__20250430T0525Z] Could not send request *<http://pinot-controller-1.pinot-controller-headless.pinot.svc.cluster.local:9000/segment>* │
│ Consumed?reason=rowLimit&streamPartitionMsgOffset=4446773&instance=Server_10.65.77.10_8098&name=player_sessions_active_minutes__9__0__20250430T0525Z&rowCount=100000&memoryUsedBytes=5248627 │
│ org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 401 (Unauthorized) with reason: "HTTP 401 Unauthorized" while sending request: /segmentConsumed?reason=rowLimit&streamPartitionMsgOffset=4446773&instance= │
│ Server_10.64.78.10_8098&name=player_sessions_active_minutes__9__0__20250430T0525Z&rowCount=100000&memoryUsedBytes=5248627 to controller: pinot-controller-1.pinot-controller-headless.pinot.svc.cluster.local, version: Unknown │
│ at org.apache.pinot.common.utils.http.HttpClient.wrapAndThrowHttpException(HttpClient.java:476) ~[pinot-all-1.4.0-SNAPSHOT-jar-with-dependencies.jar:1.4.0-SNAPSHOT-eb9c759344502969c80e3e9ec00fe67bd24d2965]
I have enabled pinotAuth in my helm chart values override
pinotAuth:
enabled: true
controllerFactoryClass: org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory
brokerFactoryClass: org.apache.pinot.broker.broker.BasicAuthAccessControlFactory
configs:
- access.control.principals=admin,user,viewer
- access.control.principals.admin.password=${admin_pass}
- access.control.principals.user.password=${user_pass}
- access.control.principals.viewer.password=${viewer_pass}
- access.control.principals.user.permissions=READ,WRITE
- access.control.principals.viewer.permissions=READ
However I cannot understand where in the helm chart I can configure basic auth access control to the controller
my reference point is the documentation here -> https://docs.pinot.apache.org/operators/tutorials/authentication/basic-auth-access-control
I tried passing extra configs to the helm chart like so
server:
extra:
configs: |-
pinot.server.segment.fetcher.auth.token=Basic ${admin_pass}
pinot.server.segment.uploader.auth.token=Basic ${admin_pass}
pinot.server.instance.auth.token=Basic ${admin_pass}
or in jvmOpts but neither worked.
🆘Zhuangda Z
04/30/2025, 4:14 PMnumDocsScanned:981376,
numEntriesScannedInFilter:55072901,
numEntriesScannedPostFilter:1962752,
numSegmentsQueried:303,
numSegmentsProcessed:45,
numSegmentsMatched:28,
For example, numEntriesScannedInFilter
, does entries
mean docs? And InFilter
means after applying relevant indices, there are still 55072901
need to be scanned for filtering(non-indexed cols)? And what makes numEntriesScannedPostFilter
!= numDocsScanned
?Chao Cao
05/01/2025, 12:46 AMSET maxRowsInJoin = 10,000,000;
SELECT
price.amount,
price.offerId,
price.sellerId,
price.itemId
FROM price
LEFT JOIN offer
ON price.sellerId = offer.sellerId
AND price.offerId = offer.offerId
WHERE
offer.internal_item_id = <valid_id>
OR price.itemId = <valid_id>
Ilam Kanniah
05/01/2025, 4:34 PMLONG
datatype and we would now like to add a timestamp index to that field but realized that the index can only be applied on TIMESTAMP
datatype.
Changing the data type is not a backward compatible change and fails schema update even though the underlying physical value stored is in long format for both.
Is the only approach to add a new column and migrate to that column / table with all the data. I was wondering if the index can support datetimespec LONG
value or the schema update can be done from LONG
to TIMESTAMP
instead in-place. Let me know what you think. thanksChao Cao
05/01/2025, 5:22 PMPuneet Singh
05/02/2025, 12:33 PM2025/05/02 12:30:45.940 ERROR [HelixHelper] [jersey-server-managed-async-executor-13] Caught exception while updating ideal state for resource: captain_offers_kpi_REALTIME
java.lang.IllegalStateException: Failed to find partition id for segment: captain_offers_kpi_REALTIME_1717632025246_1745973288376_27_e91caabf-bfae-4eb7-a68e-10726ec6e634 of table: captain_offers_kpi_REALTIME
at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:838) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.controller.helix.core.assignment.segment.StrictRealtimeSegmentAssignment.getPartitionId(StrictRealtimeSegmentAssignment.java:145) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.controller.helix.core.assignment.segment.StrictRealtimeSegmentAssignment.assignSegment(StrictRealtimeSegmentAssignment.java:81) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.lambda$assignTableSegment$16(PinotHelixResourceManager.java:2306) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.common.utils.helix.HelixHelper$1.call(HelixHelper.java:126) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.common.utils.helix.HelixHelper$1.call(HelixHelper.java:112) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:58) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:112) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:240) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.assignTableSegment(PinotHelixResourceManager.java:2298) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
--
at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2.run(ServerRuntime.java:825) [pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-be7cbbc4ac08bd2ee7ecc6364f75a0199ac83a80]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
Starysn
05/05/2025, 9:34 AM2025-05-02
only.
Below is my schema and table configuration
Schema:
"enableColumnBasedNullHandling": true,
"dateTimeFieldSpecs": [
{
"name": "datepromised",
"dataType": "TIMESTAMP",
"format": "TIMESTAMP",
"granularity": "1:DAYS"
}
]
Table:
"ingestionConfig": {
"transformConfigs": [
{
"columnName": "datepromised",
// "transformFunction": "FromDateTime(SUBSTR(JSONPATHSTRING(_airbyte_data, '$.datepromised'), 0, 10), 'yyyy-MM-dd')"
"transformFunction": "FromDateTime(SUBSTR(REPLACE(JSONPATHSTRING(_airbyte_data, '$.datepromised'), '+', ''), 0, 10), 'yyyy-MM-dd')"
}
]
}
Jose Luis
05/05/2025, 2:38 PMramesh.samineedi
05/06/2025, 7:45 AMPrijo Pauly
05/06/2025, 8:49 AMPreethi Evelyn Sadanandan
05/07/2025, 8:21 AMorg.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
?Vipin Rohilla
05/07/2025, 9:01 AM2025-05-07 14:21:58.978 INFO [main] PinotFSFactory - Did not find any fs classes in the configuration
2025-05-07 14:21:58.979 INFO [main] PinotFSFactory - Got scheme hdfs, initializing class org.apache.pinot.plugin.filesystem.HadoopPinotFS
2025-05-07 14:21:58.979 INFO [main] PinotFSFactory - Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
2025-05-07 14:21:59.098 INFO [zk-disconnector-1-thread-1] ZooKeeper - Session: 0x30048f6f3aa0010 closed
2025-05-07 14:21:59.100 INFO [main-EventThread] ClientCnxn - EventThread shut down for session: 0x30048f6f3aa0010
2025-05-07 14:21:59.357 WARN [main] NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2025-05-07 14:21:59.826 ERROR [main] StartServiceManagerCommand - Failed to start a Pinot [CONTROLLER] at 7.837 since launch
java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto tried to access method 'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' (org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed module of loader 'app')
at org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto.<init>(ErasureCodingProtos.java:10445) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at org.apache.hadoop.hdfs.protocol.proto.ErasureCodingProtos$GetECTopologyResultForPoliciesRequestProto.<clinit>(ErasureCodingProtos.java:10948) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
at java.base/java.lang.Class.forName0(Native Method) ~[?:?]
Rajat
05/07/2025, 9:25 AM[{"A_applied_weight_amount_double":94.4,"A_awb_code":"**********","A_charge_weight_amount_double":null,"A_id":605966337,"A_shipment_id":820771786,"Ar_awb_id":605966337,"Ar_zone":"z_d","Is_deleted":false,"Merged_topic_ts_ms":"2025-05-07T03:57:59.695Z","O_created_at":"2025-05-02T09:53:31Z","O_customer_city":"K.V.Rangareddy","O_customer_pincode":"500079","O_customer_state":"Telangana","O_id":824402352,"O_net_total_double":1930,"O_payment_method":"prepaid","O_shipping_method":"SR","O_sla":48,"O_total_double":1930,"Op":"s","S_awb":"**********","S_awb_assign_date":"2025-05-02T09:53:32Z","S_company_id":4613330,"S_courier":"Delhivery Surface 2 Kgs","S_created_at":"2025-05-02T09:53:31Z","S_etd":"2025-05-06T10:56:41Z","S_id":820771786,"S_order_id":824402352,"S_rto_delivered_date":"1969-12-31T18:30:00Z","S_rto_initiated_date":"1969-12-31T18:30:00Z","S_sr_courier_id":44,"S_status":7,"S_updated_at":"2025-05-05T11:59:44Z","Ts_ms_kafka":"2025-05-07T03:57:59.695Z"}]
In this the s_created_at is
"S_created_at":"2025-05-02T09:53:31Z"
but when ingesting them into pinot via LaunchDataIngestionSpec the same record was showing timestamp with 5:30 added why?
{
"columns": [
"s_id",
"s_created_at"
],
"records": [
[
820771786,
"2025-05-02 15:23:31.0"
]
]
}
here's a snap from pinot for same id:
anyone can help? @Xiang Fu @MayankNithinjith Pushpakaran
05/07/2025, 9:40 AMGeorgi Varbanov
05/07/2025, 1:55 PMGeorgi Varbanov
05/07/2025, 3:10 PMPrasad V
05/08/2025, 4:08 AMPrasad V
05/08/2025, 4:35 AMtelugu bharadwaj
05/08/2025, 9:08 AMMonika reddy
05/08/2025, 1:47 PMGeorgi Varbanov
05/08/2025, 2:02 PM