cheng
02/16/2023, 2:34 AMPeter Pringle
02/16/2023, 3:24 AMDhar Rawal
02/16/2023, 3:52 PMLuis Fernandez
02/16/2023, 3:53 PMShubham Kumar
02/20/2023, 6:17 AMcontroller.admin.access.control.principals=admin,user
controller.admin.access.control.factory.class=org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory
controller.admin.access.control.principals.user.password=secret
controller.admin.access.control.principals.admin.password=verysecret
controller.segment.fetcher.auth.token=Basic YWRtaW46dmVyeXNlY3JldA
we don’t want to expose credentials in helm manifest.HongChe Lin
02/22/2023, 3:55 AMselect count(*) from poc_table_1 limit 10
919080905
select count(*) from poc_table_1_REALTIME limit 10
5397018
select count(*) from poc_table_1_OFFLINE limit 10
919080839
Another example,
select * from poc_table_1 where uuid = 'fcc60577-ff21-4b84-b6ee-a14e3076790b';
empty result
select * from poc_table_1_REALTIME where uuid = 'fcc60577-ff21-4b84-b6ee-a14e3076790b';
has result
select * from poc_table_1_OFFLINE where uuid = 'fcc60577-ff21-4b84-b6ee-a14e3076790b';
empty result
is the behavior expected in Pinot? Or how I can correct it?Mathieu Alexandre
02/22/2023, 11:46 AMcom.azure.storage.file.datalake.models.DataLakeStorageException: Status code 409, "{"error":{"code":"PathAlreadyExists","message":"The specified path already exists.
It doesn't seem to be related to Azure as i can overwrite segments with azcopy or rclone for example. Any ideas or feedback ?Leslie
02/22/2023, 6:42 PMINSERT INTO <T> FROM FILE
. There are 10 minions replicas, 2 controllers, 2 servers, default values for memory.
Are those ingestion times expected with that configuration, should I add more minions?Harish Bohara
02/22/2023, 6:49 PM"ingestionConfig": {
"filterConfig": {
"filterFunction": "Groovy({status == 'failed'}, status)"
}
},
Schema has following:
{
"name": "status",
"dataType": "STRING"
},
Harish Bohara
02/22/2023, 6:49 PMHarish Bohara
02/22/2023, 6:51 PMpinot-controller.conf ->
controller.disable.ingestion.groovy=false
Sean
02/24/2023, 2:02 AMIAM returned 403 Forbidden: Permission 'iam.serviceAccounts.getAccessToken'
anyone know if it is possible to use the workload identity this way? i ask that since the docs online talk about the service account key location online, but usually the sdk allows either. I am not that familiar with the under-the-hood implementation so any advise would be great! TIAXiang Fu
Ashwin Raja
02/24/2023, 3:23 AMmetric
field valueUSD: BIG_DECIMAL
in the schema
it has a lot of unique values, so we don't have an inverted index, but do have a range index:
"rangeIndexColumns": [ "valueUSD"]
However, for certain query values, we get errors like this:
select count(*) from "3d957dca-7d8e-46fa-84d0-0bb476f72d09" where valueUSD < 10
"errorCode": 200,
"message": "QueryExecutionError:\njava.lang.IllegalStateException\n\tat org.apache.pinot.core.operator.filter.predicate.RangePredicateEvaluatorFactory$UnsortedDictionaryBasedRangePredicateEvaluator.applySV(RangePredicateEvaluatorFactory.java:296)\n\tat org.apache.pinot.core.operator.dociditerators.SVScanDocIdIterator$DictIdMatcher.doesValueMatch(SVScanDocIdIterator.java:271)\n\tat org.apache.pinot.core.operator.dociditerators.SVScanDocIdIterator.advance(SVScanDocIdIterator.java:116)\n\tat org.apache.pinot.core.operator.dociditerators.AndDocIdIterator.next(AndDocIdIterator.java:51)"
But then other ones are fine:
select count(*) from "3d957dca-7d8e-46fa-84d0-0bb476f72d09" where valueUSD > 500000000
1873
anybody know what could be happening here?Malte Granderath
02/24/2023, 10:11 AMFelix Li
02/24/2023, 10:44 PMfrom
), it takes upwards of 4 minutes to finish that query. In the meantime, any subsequent queries are then backed up. Below is an example of the pinot broker log detailing the queries being made and how long it took to finish them. Note that request 3349 and onwards are much simpler queries but could not be serviced because it appears that the broker only waits / services one request at a time. I’m looking at the Pinot broker configs and the closest one appears to be pinot.broker.http.server.thread.pool.corePoolSize
which defaults to 2 * # of Cores, and based on the machine, should be able to handle 16 req (if VCPUs) or 8 (if physical) concurrently. Are we doing anything wrong with our configuration or setup?
I guess another hypothesis is that the brokers do launch the queries concurrently but because servers are churning on that 1 expensive query, it’s unable to report back the simpler ones. Any help or insight would be greatly appreciated 🙏
requestId=3347,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=124727,docs=0/43401403,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):56/0/0/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=4,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-2_R=0,2,679,0,0;pinot-server-1_O=0,104938,585,0,0;pinot-server-3_O=0,107244,585,0,0;pinot-server-0_O=0,124726,585,0,0;pinot-server-2_O=0,124572,585,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT count(*) "count(*)", DISTINCTCOUNT("from") "Distinct from" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" WHERE "blockTimestamp" < fromDateTime('2023-01-24', 'yyyy-MM-dd') ORDER BY "count(*)" DESC LIMIT 2000
requestId=3348,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=249105,docs=0/43401403,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):56/0/0/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=4,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_O=0,224803,585,0,0;pinot-server-3_O=0,234892,585,0,0;pinot-server-0_O=0,249104,585,0,1;pinot-server-0_R=0,123682,649,0,1;pinot-server-2_O=0,246463,585,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT DISTINCTCOUNT("from") "Distinct from" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" WHERE "blockTimestamp" < fromDateTime('2023-01-24', 'yyyy-MM-dd') ORDER BY "Distinct from" DESC LIMIT 2000
requestId=3349,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=240074,docs=71361386/3236442740,entries=15921137/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):2056/75/75/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_R=1,215922,647,0,0;pinot-server-1_O=1,215926,557,0,0;pinot-server-3_O=1,225877,564,0,0;pinot-server-0_O=1,240073,557,0,0;pinot-server-2_O=1,237444,564,0,1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT count(*) "count(*)" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" WHERE "blockTimestamp" >= fromDateTime('2023-01-24', 'yyyy-MM-dd') ORDER BY "count(*)" DESC LIMIT 2000
requestId=3350,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=69864,docs=3234144556/3236442740,entries=14307650/13707987,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):2056/2055/2055/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-2_R=0,67473,704,0,0;pinot-server-1_O=0,45693,609,0,0;pinot-server-3_O=0,55645,609,0,0;pinot-server-0_O=0,69864,621,0,0;pinot-server-2_O=0,67212,609,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT min("blockTimestamp") "min", max("blockTimestamp") "max" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" ORDER BY "min" DESC LIMIT 2000
requestId=3351,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=64441,docs=71361386/3236442740,entries=15921137/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):2056/75/75/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_O=0,40132,564,0,0;pinot-server-3_O=0,50063,558,0,0;pinot-server-0_O=0,64441,557,0,0;pinot-server-0_R=0,64437,647,0,0;pinot-server-2_O=0,61891,557,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT count(*) "count(*)" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" WHERE "blockTimestamp" >= fromDateTime('2023-01-24', 'yyyy-MM-dd') ORDER BY "count(*)" DESC LIMIT 2000
requestId=3352,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=64028,docs=71361386/3236442740,entries=15921137/14135356,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):2056/75/75/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_R=0,39983,690,0,0;pinot-server-1_O=0,39992,593,0,0;pinot-server-3_O=1,49688,605,0,1;pinot-server-0_O=1,64026,593,0,0;pinot-server-2_O=0,61500,605,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT min("chainId") "min", max("chainId") "max" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" WHERE "blockTimestamp" >= fromDateTime('2023-01-24', 'yyyy-MM-dd') ORDER BY "min" DESC LIMIT 2000
requestId=3353,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=1382,docs=71361386/3236442740,entries=15921137/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):2056/75/75/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-2_R=0,152,647,0,0;pinot-server-1_O=0,1381,557,0,0;pinot-server-3_O=0,5,557,0,0;pinot-server-0_O=0,61,564,0,0;pinot-server-2_O=0,155,557,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT count(*) "count(*)" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" WHERE "blockTimestamp" >= fromDateTime('2023-01-24', 'yyyy-MM-dd') ORDER BY "count(*)" DESC LIMIT 2000
requestId=3354,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09,timeMs=1182,docs=71361386/3236442740,entries=15921137/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):2056/75/75/1/0/0/0,consumingFreshnessTimeMs=1677215040985,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_O=0,1181,564,0,0;pinot-server-3_O=0,5,558,0,0;pinot-server-0_O=0,163,557,0,0;pinot-server-0_R=0,159,647,0,0;pinot-server-2_O=0,4,557,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=SELECT count(*) "count(*)" FROM "3d957dca-7d8e-46fa-84d0-0bb476f72d09" WHERE "blockTimestamp" >= fromDateTime('2023-01-24', 'yyyy-MM-dd') ORDER BY "count(*)" DESC LIMIT 2000
Here are some server logs detailing how long it took to process those requests:
Processed requestId=3348,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=492/488/488/-1/0/0/0/0/4,schedulerWaitMs=103892,reqDeserMs=0,totalExecMs=101968,resSerMs=18941,totalTimeMs=224801,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=778173229,scanInFilter=1613487,scanPostFilter=1186118,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=0/0/0/0
Processed requestId=3349,table=3d957dca-7d8e-46fa-84d0-0bb476f72d09_REALTIME,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=56/56/56/1/0/0/0/0/0,schedulerWaitMs=215767,reqDeserMs=0,totalExecMs=155,resSerMs=0,totalTimeMs=215922,minConsumingFreshnessMs=1677215040985,broker=Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=43398221,scanInFilter=12831869,scanPostFilter=0,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=0/0/0/0
parth
02/26/2023, 1:19 PMSandeep Penmetsa
02/27/2023, 5:37 PM"segmentPartitionConfig": {
"columnPartitionMap": {
"studentId": {
"functionName": "Murmur",
"numPartitions": 40
}
}
},
"routing": {
"segmentPrunerTypes": [
"partition"
]
}
Shreeram Goyal
02/28/2023, 11:25 AMabhinav wagle
03/01/2023, 12:27 AMselect DATETIMECONVERT(event_ts, '1:MICROSECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd-HH tz(UTC)', '1:HOURS') AS gcp_hour, column_1, count(*)
from test_table
where gcp_hour = '2023-02-28-03' and column_1 in ('R3W9AA340023')
group by gcp_hour, column_1
order by count(*) desc
Vs Without Explict filter
Result . Shows in the Table Result : 247 for the filter I have above
select DATETIMECONVERT(event_ts, '1:MICROSECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd-HH tz(UTC)', '1:HOURS') AS gcp_hour, column_1, count(*) as num_records
from test_table
where gcp_hour = '2023-02-28-03'
group by gcp_hour, column_1
order by count(*) desc
Felix Li
03/01/2023, 10:24 AMapachepinot/pinot:0.13.0-SNAPSHOT-01f3528ffc-20230126
). The error I get is:
Task: Task_SegmentGenerationAndPushTask_ce222d64-c731-4aa4-a653-7b23591153b0_1677634744168_3 completed in: 492785ms
Problem running the task, report task as FAILED.
java.lang.UnsatisfiedLinkError: 'long xerial.larray.impl.LArrayNative.mmap(long, int, long, long)'
at xerial.larray.impl.LArrayNative.mmap(Native Method) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3772b55dc4c35673762a182b2ee650469560aa97]
at xerial.larray.mmap.MMapBuffer.<init>(MMapBuffer.java:94) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3772b55dc4c35673762a182b2ee650469560aa97]
at org.apache.pinot.segment.spi.memory.PinotNativeOrderLBuffer.mapFile(PinotNativeOrderLBuffer.java:49) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3772b55dc4c35673762a182b2ee650469560aa97]
at org.apache.pinot.segment.spi.memory.PinotDataBuffer.mapFile(PinotDataBuffer.java:194) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3772b55dc4c35673762a182b2ee650469560aa97]
at org.apache.pinot.segment.local.startree.v2.builder.OffHeapSingleTreeBuilder.ensureBufferReadable(OffHeapSingleTreeBuilder.java:186) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3772b55dc4c35673762a182b2ee650469560aa97]
at org.apache.pinot.segment.local.startree.v2.builder.OffHeapSingleTreeBuilder.getDimensionValue(OffHeapSingleTreeBuilder.java:174) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3772b55dc4c35673762a182b2ee650469560aa97]
at org.apache.pinot.segment.local.startree.v2.builder.BaseSingleTreeBuilder.constructNonStarNodes(BaseSingleTreeBuilder.java:368) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-3772b55dc4c35673762a182b2ee650469560aa97]
at ...
Laxman Ch
03/01/2023, 10:33 AMLee Wei Hern Jason
03/02/2023, 9:03 AM[MessageGenerationPhase] [HelixController-pipeline-default-stg-mimic-pinot-(5802598c_DEFAULT)] Event 5802598c_DEFAULT : Unable to find a next state for resource: geohashAreaMapDimOffline_OFFLINE partition: dimension_geohash.area_map_batch_0 from stateModelDefinitionclass org.apache.helix.model.StateModelDefinition from:ERROR to:ONLINE
We have 2 controllers/brokers & 3 servers.
cc: @Michael Roman WengleShreeram Goyal
03/02/2023, 2:09 PMLee Wei Hern Jason
03/03/2023, 4:20 AMException happened in running task: 'long xerial.larray.impl.LArrayNative.mmap(long, int, long, long)'
Do yall have any idea what may cause this ? There are some successful task but mostly task error with that log.Carlos
03/03/2023, 12:10 PM[
{
"message": "QueryExecutionError:\njava.lang.ArrayIndexOutOfBoundsException",
"errorCode": 200
}
]
And I think that is caused by a row that has multivalued field with about 8000 values. Am I right? If so, is there any parameter that i can tune to fix this?Sun
03/04/2023, 1:57 AMcurl -X GET "<https://some-domain/segments/truckmsg_json/truckmsg_json__0__213__20230226T2215Z>" -H "accept: application/octet-stream" --output truckmsg_json__0__213__20230226T2215Z
response
{
"code": 404,
"error": "Segment truckmsg_json__0__213__20230226T2215Z or table truckmsg_json not found in /tmp/data/PinotController/truckmsg_json/truckmsg_json__0__213__20230226T2215Z"
}
But that segment does exist.
curl -X GET "<https://some-domain/segments/truckmsg_json/truckmsg_json__0__213__20230226T2215Z/metadata>" -H "accept: application/json"
Response
{
"segment.realtime.endOffset": "254797012",
"segment.start.time": "1676118890000",
"segment.time.unit": "MILLISECONDS",
"segment.flush.threshold.size": "6666",
"segment.realtime.startOffset": "254790346",
"segment.end.time": "1676121549000",
"segment.total.docs": "6666",
"segment.realtime.numReplicas": "3",
"segment.creation.time": "1677449750053",
"segment.index.version": "v3",
"segment.crc": "3388412023",
"segment.realtime.status": "DONE",
"segment.download.url": "<http://172.16.19.79:9000/segments/truckmsg_json/truckmsg_json__0__213__20230226T2215Z>"
}
What could be the reason?pramod shenoy
03/04/2023, 2:59 AMJack Luo
03/04/2023, 4:33 PMWHERE col LIKE '%xx%'
on a dictionary-encoded STRING
column performs a regex scan row by row. Is there any plan to support performing search on dictionary first, get a list of matching dictionary ids, then perform exact match of the dictionary idx? This could speed up query performance by several orders of magnitude for low cardinality columns.
Of course, this approach comes with the side effect that if search is too relaxed, it could match every dictionary id. Perhaps user can decide whether to turn this feature on.parth
03/06/2023, 12:35 PM