https://pinot.apache.org/ logo
e

Elon

06/03/2021, 4:26 AM
We ran into an issue where a thread was blocked, and it caused the entire cluster to stop processing queries due to the worker (
pqw
threads in default scheduler) threadpool on one server being blocked. Would it help to make the worker thread pool use a cached threadpool while still keeping the query runner threadpool (
pqr
threads in default scheduler) fixed? Or do you recommend using one of the other query schedulers? Here is the thread dump:
m

Mayank

06/03/2021, 4:28 AM
Hmm, could you elaborate on how just one thread being blocked caused the entire cluster to stop processing?
Also, any insights on why it was blocked?
e

Elon

06/03/2021, 4:29 AM
Yep, was really strange but happened multiple times, coinciding with large # of segments scanned for a particular table. Only one
pqr
threadpool on one worker was blocked, but no queries in that tenant were completing.
We would see messages in the logs like this:
Thread was blocked (or deadlocked?) reading.
maybe it was deadlocked with a thread ingesting realtime segments?
m

Mayank

06/03/2021, 4:31 AM
I think the root cause might be something else, I don't see how one thread can block the entire cluster.
e

Elon

06/03/2021, 4:31 AM
Either way each time it happened just bouncing that one server fixed the issue.
I checked threads on all the servers in that tenant and only one particular server would have
pqw
threads in BLOCKED state with a similar thread dump each time
I did see that direct memory dropped and mmapped memory suddenly would spike each time this happened.
And no queries completed (for tables on that tenant) until the server was restarted.
m

Mayank

06/03/2021, 4:33 AM
That probably means segment completion
e

Elon

06/03/2021, 4:33 AM
Any idea what would cause that?
m

Mayank

06/03/2021, 4:34 AM
The direct mem vs mmap is segment completion.
Are you saying one thread in one server is causing entire cluster to stop processing?
Or are you saying all/several pqw in one server?
e

Elon

06/03/2021, 4:34 AM
Only for queries hitting tables on that tenant
and only 1 server had blocked pqw threads, always with the same thread dump
m

Mayank

06/03/2021, 4:35 AM
Ok, multiple blocked pqw threads on 1 server, then?
e

Elon

06/03/2021, 4:35 AM
From the thread dump I see this code:
Copy code
public void add(int dictId, int docId) {
    if (_bitmaps.size() == dictId) {
      // Bitmap for the dictionary id does not exist, add a new bitmap into the list
      ThreadSafeMutableRoaringBitmap bitmap = new ThreadSafeMutableRoaringBitmap(docId);
      try {
        _writeLock.lock();
        _bitmaps.add(bitmap);
      } finally {
        _writeLock.unlock();
      }
    } else {
      // Bitmap for the dictionary id already exists, check and add document id into the bitmap
      _bitmaps.get(dictId).add(docId);
    }
  }
Maybe bcz a segment was added while the unsynchronized code was getting the size or calling get? (i.e. the first and last lines)
And although it was multiple pqw threads blocked on 1 server, there were still runnable pqw threads. Each time this happened it was the same scenario: 1 server, 1-3 threads blocked with that same thread dump.
Are you using text index?
if not then nm
e

Elon

06/03/2021, 4:37 AM
ok:)
we're not using text indexes on this table at all
m

Mayank

06/03/2021, 4:38 AM
what release?
e

Elon

06/03/2021, 4:38 AM
0.6.0
we're testing 0.7.1, do you recommend we upgrade?
m

Mayank

06/03/2021, 4:39 AM
I am curious to know what is happening here. I didn't catch any race conditions in my radar recently.
e

Elon

06/03/2021, 4:40 AM
yep, I just noticed that the size() and get() are called unsynchronized, but it doesn't seem like that would lead to any deadlocks...
From the logs it looks like a thread was stuck reading - also not sure why it would block other queries from completing.
m

Mayank

06/03/2021, 4:41 AM
I can see how multiple pqw threads blocked can cause this. But single one, I am not so sure.
e

Elon

06/03/2021, 4:42 AM
there were multiple pqw threads blocked (1-3 out of 8 ) but there were also runnable pqw threads each time
m

Mayank

06/03/2021, 4:42 AM
There's a metric for wait time in scheduler queue for the query
e

Elon

06/03/2021, 4:42 AM
Nice, is it in the fcfs code?
m

Mayank

06/03/2021, 4:43 AM
Copy code
ServerQueryExecutorV1Impl
e

Elon

06/03/2021, 4:43 AM
thanks!
m

Mayank

06/03/2021, 4:43 AM
Check for SCHEDULER_WAIT
see if that spikes
👍 1
If so, then it is building a backlog
One way to mitigate that is to set table level timeout
e

Elon

06/03/2021, 4:44 AM
ok, and the other thing that happened each time was a huge (like 75%) drop in direct memory used, and a spike in mmapped memory.
Oh nice, how do we set the timeout?
m

Mayank

06/03/2021, 4:46 AM
Copy code
QueryConfig
Inside of Table config
Although I think since in java you cannot force interrupt a thread, not sure if that will preempt this blocked thread.
Perhaps it does
But again, this is mitigation.
👍 1
I'd recommend filing an issue with as much details as possible. We should get to the root cause.
e

Elon

06/03/2021, 4:47 AM
Sure, I saved a bunch of logs and metrics...
m

Mayank

06/03/2021, 4:47 AM
Seems like some race condition might be causing some threads to block, building a backlog (please check the scheduler wait metric)
e

Elon

06/03/2021, 4:47 AM
Would changing pqw to a cached threadpool be dangerous?
Yep, I'm looking for it now, thanks!
m

Mayank

06/03/2021, 4:48 AM
need to think about repercussions of that. I'd rather get to bottom of the issue instead of trying something.
👍 1
e

Elon

06/03/2021, 4:48 AM
yep, makes sense
thanks, I'll look into the metrics
m

Mayank

06/03/2021, 4:49 AM
Please also include info on if segment commit was happening, and if that failed or something, any special characteristics of the table/kafka-topic on the table this happens, etc.
This will help get some clues.
👍 1
e

Elon

06/03/2021, 4:50 AM
Sure, I'll file an issue, thanks for the advice!
m

Mayank

06/03/2021, 4:50 AM
thanks for reporting.
👍 1
e

Elon

06/03/2021, 4:57 AM
So I do see spikes in the scheduler wait metric:
message has been deleted
m

Mayank

06/03/2021, 5:02 AM
Do they correspond to the time when issue happened? I’d so try setting table level timeout as a fallback but we should still debug the issue
👍 1
e

Elon

06/03/2021, 5:02 AM
yep
also another note: using the trino connector I could query the servers directly with no blocks or delays
same table...
it was only broker queries which were blocked - not sure why restarting 1 server would fix the issue each time.
m

Mayank

06/03/2021, 5:16 AM
Hmm, broker to server connection is async now (since a long time), so not sure why that would happen
Oh I think it could be because Trino to server connection uses different thread pool? (not shared with pqw)?
👍 1
Also tagging @Jackie
j

Jackie

06/03/2021, 5:44 AM
If it is always the same server that is blocking, maybe check hardware failure?
E.g. block on IO because one segment is unreadable
(Just random guess)
k

Ken Krugler

06/03/2021, 2:37 PM
Hmm, I had something similar happen when I did a
DISTINCT
query on a column with very high cardinality. A broker somehow got “wedged”, and would no longer process queries - we had to bounce it. IIRC the guess at that time was some issue with the connection pool getting hung, when the response from a server was too big.
m

Mayank

06/03/2021, 3:36 PM
Yeah so the end result is queries piling up and timing out. And may be mitigated by choosing a more appropriate table level query timeout. In your case it was a expensive query. but here I am more concerned about the possible deadlock
e

Elon

06/04/2021, 12:17 AM
Yeah, it happened today again - sorry for the delay. I saved thread dumps (all similar). Today it was multiple servers, all pqw pool.
And the schedule wait metric showed pileups again. Seemed like it wasn't one large query, many small ones blocked...