Happy new year everyone! We are experiencing a ser...
# troubleshooting
e
Happy new year everyone! We are experiencing a server that seems to be "stuck" - it can process raw server queries but in QueryScheduler it appears unable to get a permit - we have a rate of 10000 queries/second, it never enters this block:
Copy code
if (queryLogRateLimiter.tryAcquire() || forceLog(schedulerWaitMs, numDocsScanned)) {
      <http://LOGGER.info|LOGGER.info>("Processed requestId={},table={},segments(queried/processed/matched/consuming)={}/{}/{}/{},"
              + "schedulerWaitMs={},reqDeserMs={},totalExecMs={},resSerMs={},totalTimeMs={},minConsumingFreshnessMs={},broker={},"
              + "numDocsScanned={},scanInFilter={},scanPostFilter={},sched={}", requestId, tableNameWithType,
Anyone else experience this? I am adding some more debug logs to see if we can reproduce. It only seems to happen on 1 server, after ~1 week of being up.
i.e. only n-1 out of n servers respond and the stuck server is what is holding up the query
k
Hi Elon, what does the jstack say
@Elon did you say you are seeing this at 10k qps?
we should probably create an issue and log the jstack output..
e
Sure, will do
@Kishore G - I believe I found the issue: we have a thread dump that shows the pqr and pqw threads are blocked due to groovy workers. I see there is an issue with groovy 2.4.8 (which pinot 5 uses). We are testing the fix now (to use a more recent version) and will create a pull request.
Untitled
k
wow, good find
so you are using groovy UDFs?
e
Yep
Thanks:)