I added two more servers to my cluster and performance has d Apache Pinot #pinot-perf-tuning

I added two more servers to my cluster, and perfor...

Ken Krugler

01/07/2021, 9:05 PM

I added two more servers to my cluster, and performance has dropped. One theory is that one or both of these new servers is slower than the previous servers, and thus causing the drop in performance. How can I confirm or refute that theory? Are there Pinot metrics I should be examining?

Mayank

01/07/2021, 9:05 PM

You can check server side latency metric. That will tell you the latency from individual servers, and identify if the new ones are slow.

Mayank

01/07/2021, 9:10 PM

Alternatively, the broker logs latency it sees from individual servers too.

👍 1

Ken Krugler

01/07/2021, 9:12 PM

Broker logs are good option, as I was hoping for something that wouldn’t require me to rig up metrics just yet.

👍 1

Subbu Subramaniam

01/07/2021, 10:51 PM

If you scatter queries to N+2 servers instead of N, that may increase latency due to GC (assuming all servers are of same capacity). The probability that any one server is delayed due to GC is now higher. Of course, it depends on N. I don't think this may be observable if N is 10, but it may well be if N is 2.

Ken Krugler

01/07/2021, 10:52 PM

N is 3 (went from 3 to 5). But seems unlikely to be GC related, as (a) it’s repeatable, and (b) time went from about 300ms to 900ms consistently.

Mayank

01/07/2021, 10:57 PM

Yeah, the broker log will tell you exactly which server took how long, and you can deduce broker side time (which will slightly increase due to more work for 'gather' phase).

Mayank

01/07/2021, 10:57 PM

Also, are you adding more nodes to reduce latency or improve throughput?

Mayank

01/07/2021, 10:57 PM

If latter, adding more replica groups might be better than adding more servers to a single replica (or not using replica groups)

Ken Krugler

01/07/2021, 10:59 PM

Adding nodes to reduce latency

👍 1

Ken Krugler

01/07/2021, 11:13 PM

@Mayank the pinotBroker.log file does have the info I’m looking for, but it seems to not be getting flushed right away. Is this something I can force flush, or change the flush interval?

Mayank

01/07/2021, 11:15 PM

Not sure, perhaps there's a log4j setting?

Ken Krugler

01/07/2021, 11:17 PM

You’re right - there’s an

immediateFlush="false"

flag in the pinot-broker-log4j2.xml file.

Mayank

01/07/2021, 11:18 PM

Oh nice, I actually didn't know.

Ken Krugler

01/07/2021, 11:40 PM

And just FYI, the change in performance was due to a change in the star tree index that happened to be made at the same time, where net-net was we wound up with a lot more nodes in the tree.

#toomanymovingparts

🙂 Thanks again for the help.

Mayank

01/07/2021, 11:40 PM

I see. Glad you were able to find the root cause.

Open in Slack

Previous Next