<@U0344FW86DD> <@U031AD802MN> Something else Im th...
# dev
a
@Gian Merlino @Laksh Singla Something else Im thinking of adding to the documentation is OS kernel tuning params that teams will likely have to explore when scaling up Druid to thousands of peon tasks. Due to the way the druid architecture works, where the Overlord communicates directly to the -> Peon, it seems at immense scale (think thousands of peons), TCP connections on the overlord begin to skyrocket (at 2-2.5K peons its around 40-50K open TCP connections on average). Restarting the overlord becomes a single point of failure since the overlord attempts to reschedule all tasks after restart causing a massive DDOS type issue depending on how you configure your tcp_keep_alive/somaxconn and other os kernel parameters to reduce TCP socket load.