is there any option to specify to reload segments ...
# troubleshooting
s
is there any option to specify to reload segments in parallel?
đź‘€ 1
k
I haven’t seen any such option. Though each server will be re-loading segments in parallel. Also the low-level code loads segments in response to messages received - but I don’t know if that message handling is done in parallel (threaded). Maybe @Jackie or @Kishore G could comment here? 🙂
k
segment reload done in parallel. you can control it using some low level Helix config dynamically
j
@Syed Akram Segment reload on each server is sequential, and it is kind of intentional because loading in parallel can take to much resources while server still need to serve queries. Generating indexes on multiple segments in parallel can also cause memory issue
k
my bad, I thought Helix messages are processed in parallel. @Jackie are we intentionally making it single threaded?
j
@Kishore G For whole table reload, it is a single message per server. We make it single threaded intentional because of the risks described above. We can add an option into the Helix message to control the parallelism, but users need to understand the side effect of it
k
Hi @Jackie - I see
SegmentFetcherAndLoader.addOrReplaceOfflineSegment()
, which I thought was how segments got loaded. But that seems to be called by a msg that’s processing a single segment, not all segments for the server.
k
got it. we should definitely create an issue.. someone might be able to make it multi-threaded and by default numThreads can still be 1
👍 1
k
@Kishore G Agreed. For example, our client’s cluster is small (6-8 servers) but they all are 32 core/128GB, so beefy enough to handle multiple downloads in parallel. And we pre-build the segment indexes in a Hadoop job, so that reduces the CPU & memory impact during segment loading.
j
@Ken Krugler I think we are discussing 2 different things here. So there are 2 scenarios: 1. Server restart - segments are loaded via the Helix state transition, which happens in parallel and can be configured via Helix config (by default 40 threads) 2. Manual triggered reload when index config is updated in table config - sequential because it requires adding index on the fly
So basically we want to add an option to use multiple threads for the second scenario
Created an issue to track this: https://github.com/apache/pinot/issues/7338
k
@Jackie so when I load say 1000 segments for a new offline table (not server restart), I assume that’s another situation where helix state transition msgs are processed in parallel, right?
j
Yes, new segments are also processed via the helix state transition
s
Yes
Thanks