Ondra Cervinka
05/13/2024, 1:16 PMAshley Allen
05/15/2024, 2:38 AMSavhanna McLellan
05/15/2024, 4:39 PMpinotAuth:
enabled: true
controllerFactoryClass: org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory
brokerFactoryClass: org.apache.pinot.broker.broker.BasicAuthAccessControlFactory
configs:
- access.control.principals=admin,reader
- access.control.principals.admin.password=${ADMIN_USER_PASSWORD}
- access.control.principals.reader.password=${READER_USER_PASSWORD}
- access.control.principals.reader.permissions=READ
Is there a way to accomplish this? The same Q was previously asked here.
I’m trying to avoid having to encode the entire file to protect these values.Ashley Allen
05/16/2024, 5:33 PMraghav
05/16/2024, 6:14 PMannotations:
service.beta.kubernetes.io/aws-load-balancer-security-groups: "sg-03baaab8b2044160d" #custom
Error message in logs for minion-stateless
2024/05/16 18:06:29.004 ERROR [StartServiceManagerCommand] [Start a Pinot [MINION]] Failed to start a Pinot [MINION] at 3.179 since launch
org.apache.helix.HelixException: fail to get config. cluster: pinot-quickstart6 is NOT setup.
I have tried the setup without aws-load-balancer-security-groups and it seems to be working fine. I have tried disabling minion-stateless but then too cluster is not coming come and it get’s stuck.Nickel Fang
05/21/2024, 7:05 AM"error": "Invalid schema: **. Reason: Schema is incompatible with tableConfig with name: ** and type: REALTIME"
Thanks!Amit Singh
05/24/2024, 8:27 AMAmit Singh
05/24/2024, 1:19 PMBenito
05/31/2024, 1:54 PMShubham
06/05/2024, 6:02 AMShobhita Agarwal
06/10/2024, 9:22 AMBenito
06/10/2024, 11:18 AMBenito
06/10/2024, 11:18 AMVojtech Honzik
06/11/2024, 7:18 AMEnda Sexton
06/13/2024, 11:56 AMraghav
06/18/2024, 1:54 PMKiril Kalchev
06/18/2024, 6:14 PMraghav
06/19/2024, 12:03 PMHugo Gonçalves
06/22/2024, 5:30 PMEnda Sexton
06/23/2024, 3:12 AMMahesh Venugopal
07/03/2024, 5:31 PMselect sum(impressions) as t_impressions, sum(spend) as t_spend, channel, arraySliceString(split(distinct_tag_group, ':'), 0, 1) as ad_id,
arraySliceString(split(distinct_tag_group, ':'), 1, 2) as tag_type,
arraySliceString(split(distinct_tag_group, ':'), 2, 3) as tag
from large_table
where created_date >= '2024-03-01' and created_date < '2024-06-01'
and ad_start_date < '2024-06-01'
group by distinct_tag_group, channel
Here created_date is the time series date, ad_start_date is when the ad became live and channel is the platform that we pull ad from, distinct_tag_group is just a derived column that is a concatenation of ad_id, tag_type and tag. I had them originally as separate columns/dimensions for the index but then understanding that they are only needed in grouping to uniquely identify a tag and not for filtering and also too many levels in the tree would affect the performance, decided to combine them. The query was taking anywhere between 800ms-1s in almost all cases with different configurations for segment size, maxLeafRecords. Also there are more columns to filter by which are not included for the tests.
So based on my tests, these are my understanding or observations about the performance of Star Tree index. It is dependent on, not limited to
i. Segment size/No. of segments
ii. Value of max leaf records
iii. Number of dimension columns/levels in the star tree
iv. Cardinality of the dimension columns
The most recent star tree index config used was:
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"distinct_tag_group",
"ad_start_date",
"created_date",
"channel"
],
"functionColumnPairs": [
"SUM__impressions",
"SUM__spend",
"SUM__ctr",
"SUM__cpc",
"SUM__cpa",
"SUM__cpm",
"SUM__roas",
"COUNT__*",
"MAX__created_at"
],
"maxLeafRecords": 10000
}
],
The cardinality for distinct_tag_group was 50000, delivery_start_date and created date around 100, channel was just 6 and scan size for the query was close to 120 million records. Segment size was of 7.5 million records. Had tried with different segment sizes and maxLeafRecords, but the response time for the query hovered between 800ms - little over a second. I understand that one big reason for the slowness would be high cardinality or the no. of results that would be returned by the query based on the grouping condition.
Please validate if my understandings and observations make sense and suggest if this is expected or there are ways to optimise this use case with the Star Tree index and what would be optimum values(understanding there is no one-size fits all or hard and fast) for segment sizes and maxLeafRecords and whether they would be relative to the total number of records or atleast to the number of records that would need to be scanned by the queries.
Just including one sample response among different config. combinations that were tried
With data of 120M records with segments of 7.5M records and star tree index of form
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"distinct_tag_group",
"ad_start_date",
"created_date",
"channel"
],
"functionColumnPairs": [
"SUM__impressions",
"SUM__spend",
"SUM__ctr",
"SUM__cpc",
"SUM__cpa",
"SUM__cpm",
"SUM__roas",
"COUNT__*",
"MAX__created_at"
],
"maxLeafRecords": 10000
}
],
No. of segments: 18/18
Avg Segment size: 479MB
Storage size: 14.45GB
Query:
select sum(impressions) as t_impressions, sum(spend) as t_spend, channel, arraySliceString(split(distinct_tag_group, ':'), 0, 1) as ad_id,
arraySliceString(split(distinct_tag_group, ':'), 1, 2) as tag_type,
arraySliceString(split(distinct_tag_group, ':'), 2, 3) as tag
from large_table
where created_date >= '2024-03-01' and created_date < '2024-06-01'
and delivery_start_date < '2024-06-01'
group by distinct_tag_group, channel
timeUsedMs numDocsScanned totalDocs numServersQueried numServersResponded numSegmentsQueried numSegmentsProcessed numSegmentsMatched numConsumingSegmentsQueried numEntriesScannedInFilter numEntriesScannedPostFilter numGroupsLimitReached partialResponse minConsumingFreshnessTimeMs offlineThreadCpuTimeNs realtimeThreadCpuTimeNs offlineSystemActivitiesCpuTimeNs realtimeSystemActivitiesCpuTimeNs offlineResponseSerializationCpuTimeNs realtimeResponseSerializationCpuTimeNs offlineTotalCpuTimeNs realtimeTotalCpuTimeNs
833 81960042 120000000 1 1 18 16 16 2 219123287 327840168
false
- 1720002700442 0 0 0 0 0 0 0 0
Mahesh Venugopal
07/06/2024, 5:50 AMVineeth Modon
07/15/2024, 7:02 AMVineeth Modon
07/15/2024, 9:39 AMVineeth Modon
07/16/2024, 11:24 AMIndrajeet Ray
07/18/2024, 4:06 PMIndrajeet Ray
07/18/2024, 4:07 PMIndrajeet Ray
07/18/2024, 4:08 PMMatias Guerson
07/21/2024, 10:28 PMBaseer Baheer
07/24/2024, 7:43 AMdocker run -p 9000:9000 \
apachepinot/pinot:1.1.0-arm64 \
QuickStart -type hybrid