I added two more servers to my cluster, and perfor...
# pinot-perf-tuning
k
I added two more servers to my cluster, and performance has dropped. One theory is that one or both of these new servers is slower than the previous servers, and thus causing the drop in performance. How can I confirm or refute that theory? Are there Pinot metrics I should be examining?
m
You can check server side latency metric. That will tell you the latency from individual servers, and identify if the new ones are slow.
Alternatively, the broker logs latency it sees from individual servers too.
👍 1
k
Broker logs are good option, as I was hoping for something that wouldn’t require me to rig up metrics just yet.
👍 1
s
If you scatter queries to N+2 servers instead of N, that may increase latency due to GC (assuming all servers are of same capacity). The probability that any one server is delayed due to GC is now higher. Of course, it depends on N. I don't think this may be observable if N is 10, but it may well be if N is 2.
k
N is 3 (went from 3 to 5). But seems unlikely to be GC related, as (a) it’s repeatable, and (b) time went from about 300ms to 900ms consistently.
m
Yeah, the broker log will tell you exactly which server took how long, and you can deduce broker side time (which will slightly increase due to more work for 'gather' phase).
Also, are you adding more nodes to reduce latency or improve throughput?
If latter, adding more replica groups might be better than adding more servers to a single replica (or not using replica groups)
k
Adding nodes to reduce latency
👍 1
@Mayank the pinotBroker.log file does have the info I’m looking for, but it seems to not be getting flushed right away. Is this something I can force flush, or change the flush interval?
m
Not sure, perhaps there's a log4j setting?
k
You’re right - there’s an
immediateFlush="false"
flag in the pinot-broker-log4j2.xml file.
m
Oh nice, I actually didn't know.
k
And just FYI, the change in performance was due to a change in the star tree index that happened to be made at the same time, where net-net was we wound up with a lot more nodes in the tree.
#toomanymovingparts
🙂 Thanks again for the help.
m
I see. Glad you were able to find the root cause.