Hey I am observing a peculiar behavior in my pinot...
# troubleshooting
p
Hey I am observing a peculiar behavior in my pinot setup. We have 3 offline servers each with about 366 segments. When I run a single query; all the servers respond fairly quickly and results come back within 100ms or so for this query. When I run the same query in parallel say about 100 times; one of the servers shows large scheduling delay (as printed by the server logs) but the execution time is much smaller. Due to this the overall queries now takes really long upto 4 seconds or so. Has anyone seen this behavior? I have tried attaching profiler to see what is happening and it indicates that the server that is showing more scheduling delay is more busy and doing more work than others. I am not able to figure out why this is the case. Any insights?
m
This means you are under-provisioned for that load. But before adding more resource, you probably want to check if you have optimized indexing, and other features like partition/sort/replica-groups etc. These will with throughput.
p
Why would the behavior be different across different servers?
That is what is throwing me off. If all of them show scheduling delays, I can understand its underprovisioned
m
How is the data partitioned? Is that server doing a lot more work for that query? For example, it hosts most of the data that needs to be scanned/processed for the query?
You can check server logs on stats about how much processing it had to do
k
Also I’m assuming you aren’t running the controller or broker processes on these servers, right? Only the Pinot server process?
1
p
My setup has 3 k8s nodes; so they do run 1 offline; 1 realtime; 1 kafka instance; 1 broker; 1 controller each. there is no ingestion going on at this time.
@User what do I check for in the server logs? I am typically looking at this line only: 2022/04/05 193141.008 INFO [QueryScheduler] [pqr-2] Processed requestId=18389,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=53/53/53/-1,schedulerWaitMs=6171,reqDeserMs=1,totalExecMs=549,resSerMs=2,totalTimeMs=6723,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-0.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1724964,scanInFilter=32807,scanPostFilter=5174892,sched=fcfs,threadCpuTimeNs=0
Data should be evenly distributed and based on segment count it seems so;
message has been deleted
m
That is the correct log to look at. For the same requestId, you can compare the numDocsScanned, totalExecMs and schedulerWait times. One of these numbers has to be much higher for the slow server. And depending on which one it is, we will know what’s wrong.
Also, I take it that these are identical nodes in terms of cpu/mem/jvm args etc?
k
@User - wondering if you 100 parallel queries are all hitting the same broker, and thus that node is getting overloaded. Can you check the load on k8s nodes to see if one is getting hammered?
p
yeah nodes are identical and the load (in terms of pods) is also identical for servers; we use k8s so the broker service round-robins the request across all brokers;
@User that is the issue numDocsScanned is not much higher; what I have observed is maybe it differs by a small number of segments (say 19 segment v/s 20 for loaded and even that is not always the case)
I have observed the same behavior when running using GKE in GCP or EKS in AWS
Here is sample output lines for the server logs; In this server-0 is slow (⎈ |metrics-perf:kfuse) [scratch/metrics-perf] for i in 0 1 2; do echo $i; cat ss | grep requestId | grep pinot-server-offline1d-$i | tail -1; done 0 pinot-server-offline1d-0 server 2022/04/06 213408.017 INFO [QueryScheduler] [pqr-5] Processed requestId=29426,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=53/53/53/-1,schedulerWaitMs=2154,reqDeserMs=0,totalExecMs=548,resSerMs=1,totalTimeMs=2703,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-1.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1724964,scanInFilter=32807,scanPostFilter=5174892,sched=fcfs,threadCpuTimeNs=0 1 pinot-server-offline1d-1 server 2022/04/06 213407.958 INFO [QueryScheduler] [pqr-6] Processed requestId=29435,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=49/49/49/-1,schedulerWaitMs=0,reqDeserMs=1,totalExecMs=398,resSerMs=2,totalTimeMs=401,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-1.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1602085,scanInFilter=24150,scanPostFilter=4806255,sched=fcfs,threadCpuTimeNs=0 2 pinot-server-offline1d-2 server 2022/04/06 213407.911 INFO [QueryScheduler] [pqr-1] Processed requestId=28189,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=53/53/53/-1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=408,resSerMs=2,totalTimeMs=410,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-2.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1742947,scanInFilter=0,scanPostFilter=5228841,sched=fcfs,threadCpuTimeNs=0
m
Then the only other thing I can think of is garbage collection. You many also want to look at metrics around memory etc