https://pinot.apache.org/ logo
a

Andrew First

08/14/2020, 7:17 PM
where i can find logs for each query? trying to troubleshoot slow queries i tried this but not luck for any of the brokers:
$ kubectl -n pinot logs pinot-broker-6 broker -f
m

Mayank

08/14/2020, 7:19 PM
Broker logs every query along with server level latencies
a

Andrew First

08/14/2020, 7:19 PM
this is all i see
Copy code
2020/08/14 18:32:27.191 INFO [HelixBrokerStarter] [main] Starting Pinot broker
2020/08/14 18:32:27.201 INFO [HelixBrokerStarter] [main] Connecting spectator Helix manager
2020/08/14 18:32:27.686 INFO [HelixBrokerStarter] [main] Setting up broker request handler
2020/08/14 18:32:28.932 INFO [HelixBrokerStarter] [main] Starting broker admin application on port: 8099
Aug 14, 2020 6:32:32 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8099]
Aug 14, 2020 6:32:32 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
2020/08/14 18:32:37.932 INFO [HelixBrokerStarter] [main] Initializing cluster change mediator
2020/08/14 18:32:37.988 INFO [HelixBrokerStarter] [main] Connecting participant Helix manager
2020/08/14 18:32:38.014 WARN [ParticipantHealthReportTask] [main] ParticipantHealthReportTimerTask already stopped
2020/08/14 18:32:38.381 WARN [CallbackHandler] [main] Callback handler received event in wrong order. Listener: org.apache.helix.messaging.handling.HelixTaskExecutor@576ebd91, path: /pinot/INSTANCES/Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099/MESSAGES, expected types: [CALLBACK, FINALIZE] but was INIT
2020/08/14 18:32:38.434 INFO [HelixBrokerStarter] [main] Registering service status handler
2020/08/14 18:32:38.437 INFO [HelixBrokerStarter] [main] Finish starting Pinot broker
2020/08/14 18:54:51.769 WARN [ConfigAccessor] [ZkClient-EventThread-39-pinot-zookeeper:2181] No config found at /pinot/CONFIGS/RESOURCE/brokerResource
for each broker. i have 500 QPS running right now
m

Mayank

08/14/2020, 7:20 PM
It should log queries, perhaps issue in log4j config?
a

Andrew First

08/14/2020, 7:20 PM
i’m running the default version from helm
is there a setting i need to configure?
m

Mayank

08/14/2020, 7:21 PM
@Xiang Fu ^^
x

Xiang Fu

08/14/2020, 7:22 PM
can you look at the logs inside the container
kubectl exec -it pod/pinot-broker-0 -n pinot-quickstart -- tail -1000f /opt/pinot/pinotBroker.log
a

Andrew First

08/14/2020, 7:24 PM
thanks, that works
i found some slow queries. how would i go about debugging them? there doesn’t seem to be much info, e.g.
Copy code
2020/08/14 19:26:18.373 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-21] requestId=111324,table=events_REALTIME,timeMs=1017,docs=0/96515,entries=4/0,segments(queried/processed/matched/consuming/unavailable):1/1/0/1/0,consumingFreshnessTimeMs=1597433178368,servers=1/1,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-8_R=0,1017,420,0,query=SELECT COUNT(*) FROM events WHERE timestampMillis > 0 AND userId = '1231910589dc193e1f8e601e' AND eventName = '10943678abb8'
x

Xiang Fu

08/14/2020, 8:39 PM
log tells it’s coming from
pinot-server-8_R=0,1017,420,0
m

Mayank

08/14/2020, 8:40 PM
Yes, then you can check what was happening on that server at that time
a

Andrew First

08/14/2020, 8:48 PM
i just see log lines like this on the server:
Copy code
2020/08/14 20:47:46.717 INFO [QueryScheduler] [pqr-12] Processed requestId=339351,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=1,totalExecMs=0,resSerMs=0,totalTimeMs=1,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs
m

Mayank

08/14/2020, 8:49 PM
Check for segment generation logs around the time
Also match broker requestId to the server requestId to see how much time the server claimed it took
a

Andrew First

08/14/2020, 8:52 PM
here’s one:
Copy code
2020/08/14 20:51:03.187 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-65] requestId=358079,table=events_REALTIME,timeMs=809,docs=0/290298,entries=3/0,segments(queried/processed/matched/consuming/unavailable):3/3/0/1/0,consumingFreshnessTimeMs=1597438262374,servers=2/2,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-4_R=0,0,421,0;pinot-server-5_R=0,809,335,0,query=SELECT COUNT(*) FROM events WHERE timestampMillis > 0 AND userId = '10646378194a99a963e07d63' AND eventName = '1001cd3dc8b6'
Copy code
2020/08/14 20:51:02.378 INFO [QueryScheduler] [pqr-8] Processed requestId=358079,table=events_REALTIME,segments(queried/processed/matched/consuming)=2/2/0/1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=0,resSerMs=0,totalTimeMs=0,minConsumingFreshnessMs=1597438262374,broker=Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs
so the server thinks it executed in 0 ms
m

Mayank

08/14/2020, 8:53 PM
Note request ids can overlap between brokers
a

Andrew First

08/14/2020, 8:53 PM
the broker ID matches
i started with broker 0
m

Mayank

08/14/2020, 8:53 PM
Ok cool
Submit delay of 809ms
It was sitting in brokers queue
a

Andrew First

08/14/2020, 8:54 PM
oh
why?
m

Mayank

08/14/2020, 8:56 PM
Sorry not broker, server
Ok so broker thinks response delay is 809ms
This should include network and server delay
a

Andrew First

08/14/2020, 8:56 PM
it looks like submit delay was 0 though? am i reading it right?
Server=SubmitDelayMs,ResponseDelayMs
server-5_R=0,809
Wouldn’t 809 be response delay?
m

Mayank

08/14/2020, 8:56 PM
Yeah I was wrong
Corrected above
So it was likely sitting on server queue
How many segments and how many cores
In that server
a

Andrew First

08/14/2020, 8:57 PM
right now there’s 384 segments, 77 in error state for some reason
m

Mayank

08/14/2020, 8:58 PM
Segment size?
a

Andrew First

08/14/2020, 8:58 PM
there’s 8 offline servers and 8 realtime servers, each with 14 CPU
Copy code
"realtime.segment.flush.threshold.time": "1h",
          "realtime.segment.flush.desired.size": "180M",
          "realtime.segment.flush.threshold.size": "0",
m

Mayank

08/14/2020, 8:58 PM
Hmm server only queried 2 segments
a

Andrew First

08/14/2020, 8:59 PM
yes, it’s partitioned by user ID
m

Mayank

08/14/2020, 8:59 PM
Ok, so it is not because of too many segments
Then GC comes to mind
Or thread contention
You should have GC logs on the server
a

Andrew First

08/14/2020, 9:00 PM
where can i find that?