hey all I observed this weird behavior with same setup the l Apache Pinot #general

hey all, I observed this weird behavior, with same...

Shounak Kulkarni

04/29/2020, 6:58 AM

hey all, I observed this weird behavior, with same setup the latency and server cpu usage is varying at different time: case 1 - PT done after few minutes of table creation

1000 rps -> mean latency 185ms

cpu usage by server -> 2.5 cores

case 2 - PT done after 8-10 hrs of table creation

1000 rps -> mean latency 20 ms

cpu usage by server -> 850 millicores

Any idea what may be the reason?

Kishore G

04/29/2020, 7:00 AM

Same queries?

Shounak Kulkarni

04/29/2020, 7:00 AM

yes

Kishore G

04/29/2020, 7:01 AM

only thing I can think of is JVM has warmed up

Kishore G

04/29/2020, 7:03 AM

and most data is in system page cache

Shounak Kulkarni

04/29/2020, 7:03 AM

actually before the first case i ran 500 rps for 15 mins to warm up

Shounak Kulkarni

04/29/2020, 7:03 AM

and there was no activity between the first and second case

Kishore G

04/29/2020, 7:04 AM

whats the query

Shounak Kulkarni

04/29/2020, 7:05 AM

"select MAX(timeSinceEpoch), activityType FROM usermap where timeSinceEpoch >= 1587180605 and userId = '%d' group by activityType" "select MAX(timeSinceEpoch), modelYearCode FROM usermap where timeSinceEpoch >= 1587180605 and userId = '%d' group by modelYearCode top 10" "select MAX(timeSinceEpoch), cCode FROM usermap where timeSinceEpoch >= 1587180605 and userId = '%d' and activityType = 'BMO' group by cCode top 10"

Kishore G

04/29/2020, 7:10 AM

yeah, it might just be the warm up

Kishore G

04/29/2020, 7:10 AM

restart the servers and rerun the benchmark

Shounak Kulkarni

04/29/2020, 7:11 AM

Shounak Kulkarni

04/29/2020, 7:26 AM

1000 rps -> 17ms server cpu usage-> 850 millicores

Sidd

04/29/2020, 7:27 AM

Is it now in the same ballpark across multiple runs?

Shounak Kulkarni

04/29/2020, 7:28 AM

yes same setup

Sidd

04/29/2020, 7:34 AM

I recommend using PerfBenchmarkRunner. Profile each run using YourKit or linux perf tool. It will generate the call-graph and you can see what is contributing to time

Shounak Kulkarni

04/29/2020, 7:41 AM

ok will look into that tool. thanks!

Sidd

04/29/2020, 7:57 AM

It is part of pinot.. you can attach a profiler to it

Copy code

sh pinot-tools.sh PerfBenchmarkRunner

Shounak Kulkarni

04/30/2020, 12:05 PM

So I tried to recreate the above two cases. The only difference is that as the default time threshold for segment creation is 6 hrs, the consuming segment moves to completion and hence there is no data in consuming segment. Not sure why the consuming segment is causing the issue

srisudha

05/01/2020, 1:16 AM

What Shounak has observed seems strange because as per him whenever there is a consuming segment in memory and different queries are fired the query latency is very high. Is this how it is supposed to work ? Or are there any configuration to counter this?

Kishore G

05/01/2020, 1:40 AM

Can we start a channel for this?

Kishore G

05/01/2020, 1:41 AM

@Shounak Kulkarni @srisudha

srisudha

05/01/2020, 2:39 AM

Sure doing it now

Sidd

05/01/2020, 2:42 AM

How many consuming vs offline segments are there?

Sidd

05/01/2020, 2:43 AM

Did we check from the broker logs that when the query latencies are high, it is the realtime servers that dominate the response time

Sidd

05/01/2020, 2:43 AM

Sidd

05/01/2020, 2:45 AM

The broker log will print our per query the number of offline and consuming segments used to serve the query and elapsed time per offline and realtime server

Kishore G

05/01/2020, 2:46 AM

can we please create a channel, thread feature in slack is very confusing

Open in Slack

Previous Next