hey all, I observed this weird behavior, with same...
# general
s
hey all, I observed this weird behavior, with same setup the latency and server cpu usage is varying at different time: case 1 - PT done after few minutes of table creation
1000 rps -> mean latency 185ms
cpu usage by server -> 2.5 cores
case 2 - PT done after 8-10 hrs of table creation
1000 rps -> mean latency 20 ms
cpu usage by server -> 850 millicores
Any idea what may be the reason?
k
Same queries?
s
yes
k
only thing I can think of is JVM has warmed up
and most data is in system page cache
s
actually before the first case i ran 500 rps for 15 mins to warm up
and there was no activity between the first and second case
k
whats the query
s
"select MAX(timeSinceEpoch), activityType FROM usermap where timeSinceEpoch >= 1587180605 and userId = '%d' group by activityType" "select MAX(timeSinceEpoch), modelYearCode FROM usermap where timeSinceEpoch >= 1587180605 and userId = '%d' group by modelYearCode top 10" "select MAX(timeSinceEpoch), cCode FROM usermap where timeSinceEpoch >= 1587180605 and userId = '%d' and activityType = 'BMO' group by cCode top 10"
k
yeah, it might just be the warm up
restart the servers and rerun the benchmark
s
ok
1000 rps -> 17ms server cpu usage-> 850 millicores
s
Is it now in the same ballpark across multiple runs?
s
yes same setup
s
I recommend using PerfBenchmarkRunner. Profile each run using YourKit or linux perf tool. It will generate the call-graph and you can see what is contributing to time
s
ok will look into that tool. thanks!
s
It is part of pinot.. you can attach a profiler to it
Copy code
sh pinot-tools.sh PerfBenchmarkRunner
s
So I tried to recreate the above two cases. The only difference is that as the default time threshold for segment creation is 6 hrs, the consuming segment moves to completion and hence there is no data in consuming segment. Not sure why the consuming segment is causing the issue
s
What Shounak has observed seems strange because as per him whenever there is a consuming segment in memory and different queries are fired the query latency is very high. Is this how it is supposed to work ? Or are there any configuration to counter this?
k
Can we start a channel for this?
@Shounak Kulkarni @srisudha
s
Sure doing it now
s
How many consuming vs offline segments are there?
Did we check from the broker logs that when the query latencies are high, it is the realtime servers that dominate the response time
?
The broker log will print our per query the number of offline and consuming segments used to serve the query and elapsed time per offline and realtime server
k
can we please create a channel, thread feature in slack is very confusing