xtrntr
09/08/2021, 2:55 AM# schema:
user(int) | location(int) | time(long)
# 1st query (filter):
"select user, count(*) from {table} where time between {start} and {end} and location between 500 and 550 group by user having count(user) >= 24 limit 1000000"
# 2nd query (combiner):
"select time, count(distinct(user)) as count from {table} where user in ({users}) and time between {start} and {end} and location between 300 and 350 group by time limit 10000000"
the query time scales linearly with the number of selected userids from the first queryxtrntr
09/08/2021, 2:58 AMquery took 1.5208384820143692
found 132164 userids
query took 1.2651426389929838
using the java client, with roaring bitmaps to serialize userids from the first query takes longer:
query took 1.601
found 132164 userids
query took 2.586
both clients were measured solely on the execution time of the pinot queryxtrntr
09/08/2021, 3:00 AMin_subquery
or in_id_set
construct from https://docs.google.com/document/d/1s6DZ9eTPqH7vaKQlPjKiWb_OBC3hkkEGICIzcd5gozc/edit#
because i was advised by @Jackie to build the idset client side due to the nature of userid filtering in my first queryxtrntr
09/08/2021, 3:05 AMtimeMs=1159,
docs=5175479/449912692,
entries=456484558/5175479,
segments(queried/processed/matched/consuming/unavailable):10/10/10/0/0,consumingFreshnessTimeMs=0,
servers=10/10,
groupLimitReached=false,
brokerReduceTimeMs=516,
exceptions=0,
serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);
pinot-server-3_O=0,592,382743,0,-1;
pinot-server-5_O=0,580,373923,0,-1;
pinot-server-1_O=0,573,372135,0,-1;
pinot-server-7_O=0,583,378483,0,-1;
pinot-server-0_O=0,579,374595,0,-1;
pinot-server-6_O=0,640,379359,0,-1;
pinot-server-9_O=0,589,386235,0,-1;
pinot-server-4_O=0,517,381771,0,-1;
pinot-server-2_O=0,542,383007,0,1;
pinot-server-8_O=0,590,379035,0,-1,
offlineThreadCpuTimeNs=5758172708,
realtimeThreadCpuTimeNs=0
Mayank
Mayank
xtrntr
09/08/2021, 3:07 AM4000m cpu, 10GB mem
• broker pods: 1000m cpu, 1GB mem
• requests and limits are the same for both pods, as followed the default metrics in https://docs.pinot.apache.org/operators/tutorials/deployment-pinot-on-kubernetesxtrntr
09/08/2021, 3:09 AMdictionary | forward-index | inverted-index
user Y Y Y
location Y Y Y
time Y Y Y
xtrntr
09/08/2021, 3:09 AMxtrntr
09/08/2021, 3:13 AMUnsupported functions
DISTINCT_COUNT
Intermediate result Set is unbounded
Jackie
09/08/2021, 3:18 AMxtrntr
09/08/2021, 3:19 AMWhat’s the latency without having clausechecking now
Jackie
09/08/2021, 3:20 AMJackie
09/08/2021, 3:21 AMxtrntr
09/08/2021, 3:26 AMxtrntr
09/08/2021, 3:29 AMxtrntr
09/08/2021, 3:47 AM# 1st query (old):
"select user, count(*) from {table} where time between {start} and {end} and location between 500 and 550 group by user having count(user) >= 24 limit 1000000"
# 1st query (new):
"select user, count(*) as count from {table} where time between {start} and {end} and location between 500 and 550 group by user having count(user) >= 24 order by count limit 1000000"
xtrntr
09/08/2021, 3:47 AMxtrntr
09/08/2021, 5:28 AMusers = ~5-6 million
time = 96
cells = ~2500
xtrntr
09/08/2021, 8:28 AM# get relevant users
...
# step 2 (reducer type 2)
"select location, count(user) as count from {table} where user in ({users}) and time between {start} and {end} group by location limit 10000000"
# step 2 (reducer type 3)
"select count(user) from {table} where user in ({users}) and time_rounded between {start} and {end} and location between 300 and 505 group by user limit 10000000"
# step 2 (reducer type 4)
"select location, count(distinct(user)) from {table} where user in ({users}) and time_rounded between {start} and {end} and location between 300 and 310 group by location limit 10000000"
xtrntr
09/08/2021, 8:41 AMxtrntr
09/09/2021, 9:28 AMxtrntr
09/09/2021, 9:28 AMMayank
broker pods: 1000cpu, 1GB mem
?xtrntr
09/09/2021, 4:15 PM1000cpu
be 1000m
- edited the originalMayank
xtrntr
09/09/2021, 4:16 PMxtrntr
09/09/2021, 4:17 PMxtrntr
09/09/2021, 4:19 PM1,299,260
chars
• java client, using roaring bitmap as a base64 string 353,755
charsMayank
brokerReduceTimeMs=516,
-> This is the reduce phase in broker that happens after getting results from servers, so outside of parsingMayank
xtrntr
09/09/2021, 4:34 PMMayank