Hey I am observing a peculiar behavior in my pinot setup We Apache Pinot #troubleshooting

Hey I am observing a peculiar behavior in my pinot...

Pankaj Thakkar

04/05/2022, 7:01 PM

Hey I am observing a peculiar behavior in my pinot setup. We have 3 offline servers each with about 366 segments. When I run a single query; all the servers respond fairly quickly and results come back within 100ms or so for this query. When I run the same query in parallel say about 100 times; one of the servers shows large scheduling delay (as printed by the server logs) but the execution time is much smaller. Due to this the overall queries now takes really long upto 4 seconds or so. Has anyone seen this behavior? I have tried attaching profiler to see what is happening and it indicates that the server that is showing more scheduling delay is more busy and doing more work than others. I am not able to figure out why this is the case. Any insights?

Mayank

04/05/2022, 7:03 PM

This means you are under-provisioned for that load. But before adding more resource, you probably want to check if you have optimized indexing, and other features like partition/sort/replica-groups etc. These will with throughput.

Pankaj Thakkar

04/05/2022, 7:08 PM

Why would the behavior be different across different servers?

Pankaj Thakkar

04/05/2022, 7:08 PM

That is what is throwing me off. If all of them show scheduling delays, I can understand its underprovisioned

Mayank

04/05/2022, 7:32 PM

How is the data partitioned? Is that server doing a lot more work for that query? For example, it hosts most of the data that needs to be scanned/processed for the query?

Mayank

04/05/2022, 7:33 PM

You can check server logs on stats about how much processing it had to do

Ken Krugler

04/05/2022, 8:40 PM

Also I’m assuming you aren’t running the controller or broker processes on these servers, right? Only the Pinot server process?

➕ 1

Pankaj Thakkar

04/06/2022, 12:31 AM

My setup has 3 k8s nodes; so they do run 1 offline; 1 realtime; 1 kafka instance; 1 broker; 1 controller each. there is no ingestion going on at this time.

Pankaj Thakkar

04/06/2022, 12:33 AM

@User what do I check for in the server logs? I am typically looking at this line only: 2022/04/05 193141.008 INFO [QueryScheduler] [pqr-2] Processed requestId=18389,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=53/53/53/-1,schedulerWaitMs=6171,reqDeserMs=1,totalExecMs=549,resSerMs=2,totalTimeMs=6723,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-0.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1724964,scanInFilter=32807,scanPostFilter=5174892,sched=fcfs,threadCpuTimeNs=0

Pankaj Thakkar

04/06/2022, 12:34 AM

Data should be evenly distributed and based on segment count it seems so;

Pankaj Thakkar

04/06/2022, 12:34 AM

Screen Shot 2022-04-05 at 5.34.32 PM.png

Mayank

04/06/2022, 12:50 AM

That is the correct log to look at. For the same requestId, you can compare the numDocsScanned, totalExecMs and schedulerWait times. One of these numbers has to be much higher for the slow server. And depending on which one it is, we will know what’s wrong.

Mayank

04/06/2022, 12:51 AM

Also, I take it that these are identical nodes in terms of cpu/mem/jvm args etc?

Ken Krugler

04/06/2022, 3:09 AM

@User - wondering if you 100 parallel queries are all hitting the same broker, and thus that node is getting overloaded. Can you check the load on k8s nodes to see if one is getting hammered?

Pankaj Thakkar

04/06/2022, 9:31 PM

yeah nodes are identical and the load (in terms of pods) is also identical for servers; we use k8s so the broker service round-robins the request across all brokers;

Pankaj Thakkar

04/06/2022, 9:32 PM

@User that is the issue numDocsScanned is not much higher; what I have observed is maybe it differs by a small number of segments (say 19 segment v/s 20 for loaded and even that is not always the case)

Pankaj Thakkar

04/06/2022, 9:33 PM

I have observed the same behavior when running using GKE in GCP or EKS in AWS

Pankaj Thakkar

04/06/2022, 10:32 PM

Here is sample output lines for the server logs; In this server-0 is slow (⎈ |metrics-perf:kfuse) [scratch/metrics-perf] for i in 0 1 2; do echo $i; cat ss | grep requestId | grep pinot-server-offline1d-$i | tail -1; done 0 pinot-server-offline1d-0 server 2022/04/06 213408.017 INFO [QueryScheduler] [pqr-5] Processed requestId=29426,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=53/53/53/-1,schedulerWaitMs=2154,reqDeserMs=0,totalExecMs=548,resSerMs=1,totalTimeMs=2703,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-1.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1724964,scanInFilter=32807,scanPostFilter=5174892,sched=fcfs,threadCpuTimeNs=0 1 pinot-server-offline1d-1 server 2022/04/06 213407.958 INFO [QueryScheduler] [pqr-6] Processed requestId=29435,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=49/49/49/-1,schedulerWaitMs=0,reqDeserMs=1,totalExecMs=398,resSerMs=2,totalTimeMs=401,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-1.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1602085,scanInFilter=24150,scanPostFilter=4806255,sched=fcfs,threadCpuTimeNs=0 2 pinot-server-offline1d-2 server 2022/04/06 213407.911 INFO [QueryScheduler] [pqr-1] Processed requestId=28189,table=kf_metrics_REALTIME,segments(queried/processed/matched/consuming)=53/53/53/-1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=408,resSerMs=2,totalTimeMs=410,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-2.pinot-broker-headless.kfuse.svc.cluster.local_8099,numDocsScanned=1742947,scanInFilter=0,scanPostFilter=5228841,sched=fcfs,threadCpuTimeNs=0

Mayank

04/06/2022, 11:14 PM

Then the only other thing I can think of is garbage collection. You many also want to look at metrics around memory etc

Open in Slack

Previous Next