https://pinot.apache.org/ logo
e

Elon

01/04/2021, 10:48 PM
Happy new year everyone! We are experiencing a server that seems to be "stuck" - it can process raw server queries but in QueryScheduler it appears unable to get a permit - we have a rate of 10000 queries/second, it never enters this block:
Copy code
if (queryLogRateLimiter.tryAcquire() || forceLog(schedulerWaitMs, numDocsScanned)) {
      <http://LOGGER.info|LOGGER.info>("Processed requestId={},table={},segments(queried/processed/matched/consuming)={}/{}/{}/{},"
              + "schedulerWaitMs={},reqDeserMs={},totalExecMs={},resSerMs={},totalTimeMs={},minConsumingFreshnessMs={},broker={},"
              + "numDocsScanned={},scanInFilter={},scanPostFilter={},sched={}", requestId, tableNameWithType,
Anyone else experience this? I am adding some more debug logs to see if we can reproduce. It only seems to happen on 1 server, after ~1 week of being up.
i.e. only n-1 out of n servers respond and the stuck server is what is holding up the query
k

Kishore G

01/05/2021, 3:25 AM
Hi Elon, what does the jstack say
@Elon did you say you are seeing this at 10k qps?
we should probably create an issue and log the jstack output..
e

Elon

01/05/2021, 5:20 PM
Sure, will do
@Kishore G - I believe I found the issue: we have a thread dump that shows the pqr and pqw threads are blocked due to groovy workers. I see there is an issue with groovy 2.4.8 (which pinot 5 uses). We are testing the fix now (to use a more recent version) and will create a pull request.
k

Kishore G

01/08/2021, 1:48 AM
wow, good find
so you are using groovy UDFs?
e

Elon

01/08/2021, 2:11 AM
Yep
Thanks:)