hello i could use some help with query tuning ``` schema use Apache Pinot #troubleshooting

hello, i could use some help with query tuning: ``...

xtrntr

09/08/2021, 2:55 AM

hello, i could use some help with query tuning:

Copy code

# schema:
user(int) | location(int) | time(long)

# 1st query (filter):
"select user, count(*) from {table} where time between {start} and {end} and location between 500 and 550 group by user having count(user) >= 24 limit 1000000"

# 2nd query (combiner):
"select time, count(distinct(user)) as count from {table} where user in ({users}) and time between {start} and {end} and location between 300 and 350 group by time limit 10000000"

the query time scales linearly with the number of selected userids from the first query

xtrntr

09/08/2021, 2:58 AM

segment size = 300mb number of records = ~500 million per day (10 segments) using the python client takes around:

Copy code

query took 1.5208384820143692
found 132164 userids
query took 1.2651426389929838

using the java client, with roaring bitmaps to serialize userids from the first query takes longer:

Copy code

query took 1.601
found 132164 userids
query took 2.586

both clients were measured solely on the execution time of the pinot query

xtrntr

09/08/2021, 3:00 AM

i don’t use the

in_subquery

in_id_set

construct from https://docs.google.com/document/d/1s6DZ9eTPqH7vaKQlPjKiWb_OBC3hkkEGICIzcd5gozc/edit# because i was advised by @Jackie to build the idset client side due to the nature of userid filtering in my first query

xtrntr

09/08/2021, 3:05 AM

broker logs:

Copy code

timeMs=1159,
docs=5175479/449912692,
entries=456484558/5175479,
segments(queried/processed/matched/consuming/unavailable):10/10/10/0/0,consumingFreshnessTimeMs=0,
servers=10/10,
groupLimitReached=false,
brokerReduceTimeMs=516,
exceptions=0,
serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);
pinot-server-3_O=0,592,382743,0,-1;
pinot-server-5_O=0,580,373923,0,-1;
pinot-server-1_O=0,573,372135,0,-1;
pinot-server-7_O=0,583,378483,0,-1;
pinot-server-0_O=0,579,374595,0,-1;
pinot-server-6_O=0,640,379359,0,-1;
pinot-server-9_O=0,589,386235,0,-1;
pinot-server-4_O=0,517,381771,0,-1;
pinot-server-2_O=0,542,383007,0,1;
pinot-server-8_O=0,590,379035,0,-1,
offlineThreadCpuTimeNs=5758172708,
realtimeThreadCpuTimeNs=0

Mayank

09/08/2021, 3:05 AM

What’s the latency without having clause

Mayank

09/08/2021, 3:07 AM

Also what’s the indexing you have right now

xtrntr

09/08/2021, 3:07 AM

i don’t see anything unusual on the grafana metrics - • pinot cluster is deployed on k8s • server pods:

4000m cpu, 10GB mem

• broker pods:

1000m cpu, 1GB mem

• requests and limits are the same for both pods, as followed the default metrics in https://docs.pinot.apache.org/operators/tutorials/deployment-pinot-on-kubernetes

xtrntr

09/08/2021, 3:09 AM

indexing:

Copy code

dictionary | forward-index | inverted-index
user          Y                Y             Y
location      Y                Y             Y
time          Y                Y             Y

xtrntr

09/08/2021, 3:09 AM

segments were generated from parquet files that are physically sorted by user, then time

xtrntr

09/08/2021, 3:13 AM

i don’t know if star tree index will help here, since the doc says

Unsupported functions

DISTINCT_COUNT

Intermediate result Set is unbounded

Jackie

09/08/2021, 3:18 AM

Broker reduce takes quite long

xtrntr

09/08/2021, 3:19 AM

What’s the latency without having clause

checking now

Jackie

09/08/2021, 3:20 AM

One think that can potentially help is to add an order by on count for the first query and reduce the limit

Jackie

09/08/2021, 3:21 AM

With such high limit, broker might run into heavy GCs

xtrntr

09/08/2021, 3:26 AM

latency without HAVING clause is just as bad, ~2s with brokerReduceTimeMs=~700

xtrntr

09/08/2021, 3:29 AM

query time also can vary between 1.5s to 3s for either query 1 or query 2

xtrntr

09/08/2021, 3:47 AM

when you say “add an order by on count”, do you mean this?

Copy code

# 1st query (old):
"select user, count(*) from {table} where time between {start} and {end} and location between 500 and 550 group by user having count(user) >= 24 limit 1000000"

# 1st query (new):
"select user, count(*) as count from {table} where time between {start} and {end} and location between 500 and 550 group by user having count(user) >= 24 order by count limit 1000000"

xtrntr

09/08/2021, 3:47 AM

i’ve tried that, and also reducing the limit (to a point where i dont get truncated results). still slow 😕

xtrntr

09/08/2021, 5:28 AM

cardinality of the items in schema:

Copy code

users = ~5-6 million
time = 96
cells = ~2500

xtrntr

09/08/2021, 8:28 AM

here are other types of queries that scale linearly with the amount of users fetched from the first step:

Copy code

# get relevant users
...

# step 2 (reducer type 2)
"select location, count(user) as count from {table} where user in ({users}) and time between {start} and {end} group by location limit 10000000"

# step 2 (reducer type 3)
"select count(user) from {table} where user in ({users}) and time_rounded between {start} and {end} and location between 300 and 505 group by user limit 10000000"

# step 2 (reducer type 4)
"select location, count(distinct(user)) from {table} where user in ({users}) and time_rounded between {start} and {end} and location between 300 and 310 group by location limit 10000000"

xtrntr

09/08/2021, 8:41 AM

would making segments smaller so that the computation can be further parallelized help? they’re currently at 300mb untarred/uncompressed per segment, or would throwing more resources at the broker help?

xtrntr

09/09/2021, 9:28 AM

hi @Mayank @Jackie @Xiang Fu, any suggestions based on the info that i’ve provided? • nature of the queries • broker/servers resources • broker/servers logs • data volume • schema and cardinality • indexes • using java client with roaring bitmap idset / python client

xtrntr

09/09/2021, 9:28 AM

what else should i be looking at?

Mayank

09/09/2021, 4:13 PM

What does this 1000cpu mean -

broker pods: 1000cpu, 1GB mem

xtrntr

09/09/2021, 4:15 PM

sorry, it should

1000cpu

1000m

- edited the original

Mayank

09/09/2021, 4:16 PM

Still don't know how to read it, is it 1 core?

xtrntr

09/09/2021, 4:16 PM

ah yes. 1000m = 1virtual cpu in k8s notation

Mayank

09/09/2021, 4:16 PM

Also, in terms of your query, are you trying to intersect user sets that were between location A and B? If so, if you are ok with approximations, then look a this video on Theta-sketches:

https://www.youtube.com/watch?v=HlVZjUCBabk&t=2652s▾

xtrntr

09/09/2021, 4:17 PM

unfortunately i need to have the exact number. precision is important here

xtrntr

09/09/2021, 4:19 PM

i just had another thought - is parsing an extremely long query string a cause for the slowness? e.g. if my 1st query yields 100k userids, putting these 100k userids in the 2nd query’s IN clause results in a sql string length: • python client, just appending ints together:

1,299,260

chars • java client, using roaring bitmap as a base64 string

353,755

chars

Mayank

09/09/2021, 4:31 PM

brokerReduceTimeMs=516,

-> This is the reduce phase in broker that happens after getting results from servers, so outside of parsing

Mayank

09/09/2021, 4:32 PM

If you want to improve broker perf, you need to give it more CPU and memory. The reduce phase can happen in parallel if there are multiple servers' responses available to reduce on, and the threads can get CPU

xtrntr

09/09/2021, 4:34 PM

i can try that, although i see that CPU utilization for both brokers and servers are below 10%

Mayank

09/09/2021, 4:40 PM

This is a CPU intensive query, so that implies that you are becoming memory/io bound. Another possibility is that there's CPU contention, but not obvious to me at this time.

Open in Slack

Previous Next