Luis Fernandez
10/12/2021, 2:56 PMselect * from table where user_id = x
when we first hit a query like this we get more than 500ms after we hit it again we get good results i guess it’s because the segment gets closer to memory, i was wondering why something like this would happen 500ms is def out of our expectations for query latency, our current configuration of the table has indexing and it’s a real time table.
our current config for noDictionaryColumns
"noDictionaryColumns": [
"click_count",
"impression_count",
],
so that we can aggregate in our dimensions using “aggregateMetrics” : true
segment flushing config configurations:
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.segment.size": "250M"
we have rangeIndex in our serve_time which is an epoch timestamp to the hour.
we have an invertexindex on the user_id and sortedcolumn as well as a partition map with 4 partitions with modulo.
we chose 4 partitions because the consuming topic has 4 partitions.
the consuming topic is getting around 5k messages a second.
finally we currently have 2 servers with 4gigs of heap for java and 10g in the machine itself 4 cpu and 500G of disk space.
at the moment of writing this message we have 96 segments in this table.
metrics from what we issue a query like the one seen above:
timeUsedMs numDocsScanned totalDocs numServersQueried numServersResponded numSegmentsQueried numSegmentsProcessed numSegmentsMatched numConsumingSegmentsQueried numEntriesScannedInFilter numEntriesScannedPostFilter numGroupsLimitReached partialResponse minConsumingFreshnessTimeMs offlineThreadCpuTimeNs realtimeThreadCpuTimeNs
264 40 401325330 2 2 93 93 4 1 0 320 false - 1634050463550 0 159743463
could anyone direct me into what to look into even this queries based on the trouble shooting steps don’t seem to have much numDocsScanned and numEntriesScannedPostFilterRichard Startin
10/12/2021, 3:10 PMLuis Fernandez
10/12/2021, 3:12 PMRichard Startin
10/12/2021, 3:13 PMLuis Fernandez
10/12/2021, 3:15 PMRichard Startin
10/12/2021, 3:15 PMLuis Fernandez
10/12/2021, 3:15 PMRichard Startin
10/12/2021, 3:16 PMuser_id
?user_id
which will incur some set up costs (what version are you using, by the way)? Which dimension are you sorting by? Would it be possible to sort by user_id
so you can use a sorted index instead?Luis Fernandez
10/12/2021, 3:21 PMRichard Startin
10/12/2021, 3:25 PMuser_id
and in 0.9.0 the new range index should have much lower latency for impression_count
and click_count
Luis Fernandez
10/12/2021, 3:31 PM"exceptions": [],
"numServersQueried": 2,
"numServersResponded": 2,
"numSegmentsQueried": 93,
"numSegmentsProcessed": 93,
"numSegmentsMatched": 4,
"numConsumingSegmentsQueried": 1,
"numDocsScanned": 40,
"numEntriesScannedInFilter": 0,
"numEntriesScannedPostFilter": 320,
"numGroupsLimitReached": false,
"totalDocs": 404198690,
"timeUsedMs": 505,
"offlineThreadCpuTimeNs": 0,
"realtimeThreadCpuTimeNs": 294557175,
"segmentStatistics": [],
"traceInfo": {},
"numRowsResultSet": 10,
"minConsumingFreshnessTimeMs": 1634052628512
Richard Startin
10/12/2021, 3:33 PM-agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=cpu.html
while you run a load of queries which always query a user_id for the first time (better automated) for about a minute or so, we can see exactly what's going onLuis Fernandez
10/12/2021, 3:35 PM"exceptions": [],
"numServersQueried": 2,
"numServersResponded": 2,
"numSegmentsQueried": 93,
"numSegmentsProcessed": 93,
"numSegmentsMatched": 4,
"numConsumingSegmentsQueried": 1,
"numDocsScanned": 40,
"numEntriesScannedInFilter": 0,
"numEntriesScannedPostFilter": 320,
"numGroupsLimitReached": false,
"totalDocs": 404233610,
"timeUsedMs": 452,
"offlineThreadCpuTimeNs": 0,
"realtimeThreadCpuTimeNs": 181968373,
"segmentStatistics": [],
"traceInfo": {
"pinot-server-1.pinot-server-headless.pinot.svc.cluster.local": "[{\"0\":[{\"SelectionOnlyCombineOperator Time\":181},{\"InstanceResponseOperator Time\":182}]},{\"0_0\":[]},{\"0_1\":[]},{\"0_2\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":51}]},{\"0_3\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":180}]}]",
"pinot-server-0.pinot-server-headless.pinot.svc.cluster.local": "[{\"0\":[{\"SelectionOnlyCombineOperator Time\":1},{\"InstanceResponseOperator Time\":1}]},{\"0_0\":[]},{\"0_1\":[]},{\"0_3\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":1}]},{\"0_2\":[{\"SortedIndexBasedFilterOperator Time\":0},{\"DocIdSetOperator Time\":0},{\"ProjectionOperator Time\":0},{\"PassThroughTransformOperator Time\":0},{\"SelectionOnlyOperator Time\":1}]}]"
},
"numRowsResultSet": 10,
"minConsumingFreshnessTimeMs": 1634052785732
Richard Startin
10/12/2021, 3:40 PMSelectionOnlyCombineOperator
and SelectionOnlyOperator
are the problems in this query, forget everything I said about indexesKishore G
Luis Fernandez
10/12/2021, 3:49 PMselect * from table where user_id = x
SelectionOnlyCombineOperator
and SelectionOnlyOperator
Richard Startin
10/12/2021, 3:56 PMKishore G
"numSegmentsQueried": 93,
"numSegmentsProcessed": 93,
"numSegmentsMatched": 4,
Luis Fernandez
10/12/2021, 3:56 PMKishore G
Luis Fernandez
10/12/2021, 3:57 PM"numSegmentsQueried": 93
Kishore G
Luis Fernandez
10/12/2021, 3:58 PMKishore G
Luis Fernandez
10/12/2021, 4:01 PMKishore G
Luis Fernandez
10/12/2021, 4:23 PMKishore G
Luis Fernandez
10/12/2021, 4:25 PMKishore G
Luis Fernandez
10/12/2021, 4:26 PM"segmentPartitionConfig": {
"columnPartitionMap": {
Kishore G
Luis Fernandez
10/12/2021, 4:27 PMKishore G
Luis Fernandez
10/12/2021, 5:14 PM"routing": {
"segmentPrunerTypes": [
"partition"
]
},
Kenny Bastani
10/12/2021, 6:03 PMLIMIT
clause on your query. Let's see how it affects the query response time. I've noticed some wonkiness lately related to this.Kishore G
Mayank
- Definitely sort on user_id if you are not doing so.
- If high throughput, partition on user_id as well.
- The partition function does need to be specified and matched for partition based pruning.
Luis Fernandez
10/12/2021, 7:02 PMMayank
Luis Fernandez
10/12/2021, 7:03 PMMayank
Luis Fernandez
10/12/2021, 7:03 PMMayank
Luis Fernandez
10/12/2021, 7:04 PMMayank