Slackbot
05/09/2023, 4:06 AMNeelesh Sharma
05/09/2023, 4:08 AMc6gd.4xlarge
vCPUs: 16
Memory (GiB): 32.0
count: 3 instances
jvm.config
-Xms8g
-Xmx8g
-XX:MaxDirectMemorySize=10g
runtime.properties
# HTTP server threads
druid.server.http.numThreads=60
# Processing threads and buffers
druid.processing.buffer.sizeBytes=500MiB
## numMergeBuffers should be around numThreads/4
druid.processing.numMergeBuffers=4
## numThreads should be vCPU-1
druid.processing.numThreads=15
druid.processing.tmpDir=var/druid/processing
# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256MiB
Sergio Ferragut
05/09/2023, 7:00 PMdruid.processing.numThreads
for the historical and the druid.worker.capacity
of the middle managers and consider the memory footprint of each such that all Peons, the MM and the Historical all fit within the 32g of the node.Neelesh Sharma
05/10/2023, 3:23 AMNeelesh Sharma
05/10/2023, 3:30 AMNeelesh Sharma
05/10/2023, 6:49 AMGian Merlino
05/10/2023, 7:31 PMdruid.processing.numThreads
then I suggest spending some time figuring out why CPU isn't at 90–100%
It sounds like your system is at/near max load (based on the timeouts)— ordinarily we want CPU to be maxed when this is the case, not hovering at 50–60%. Could mean some inefficiency somewhere.Gian Merlino
05/10/2023, 7:32 PMGian Merlino
05/10/2023, 7:32 PMNeelesh Sharma
05/11/2023, 6:27 AMNeelesh Sharma
05/11/2023, 6:29 AMNeelesh Sharma
05/11/2023, 6:34 AM## broker jvm.config
# HTTP server settings
druid.server.http.numThreads=60
druid.server.http.maxSubqueryRows=600000
# HTTP client settings
druid.broker.http.numConnections=50
druid.broker.http.maxQueuedBytes=10MiB
# Processing threads and buffers
druid.processing.buffer.sizeBytes=500MiB
druid.processing.numMergeBuffers=6
druid.processing.tmpDir=var/druid/processing
# Query cache disabled -- push down caching and merging instead
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false
and broker jvm.config
-server
-Xms18g
-Xmx18g
-XX:MaxDirectMemorySize=6g
Neelesh Sharma
05/11/2023, 10:26 AMi see ..understood. will increase the data node count to 4 and monitor.looking good so far! thanks @Gian Merlino, @Sergio Ferragut
I would first try adding more data servers and seeing if that improves your situation with the timeouts. If not then it's likely an inefficiency at the Broker. If it does go up then it's likely an inefficiency at the data serverswill continue monitoring, but i guess early signs points toward "inefficiency at the data servers" 🤔
Neelesh Sharma
05/11/2023, 11:00 AMNeelesh Sharma
05/15/2023, 12:04 PMSergio Ferragut
05/15/2023, 3:45 PMSergio Ferragut
05/15/2023, 3:47 PMNeelesh Sharma
05/16/2023, 5:10 AMIs there a particular query that does thisnot that i can tell 🤔 the most frequent one looks like a simple query with no join
select avg(ratings) from ratings where user = '' and __time in last month
Neelesh Sharma
05/16/2023, 5:29 AMFor Historicals, druid.server.http.numThreads should be set to a value slightly higher than the sum of druid.broker.http.numConnections across all the Brokers in the cluster.
On the Brokers, please ensure that the sum of druid.broker.http.numConnections across all the Brokers is slightly lower than the value of druid.server.http.numThreads on your Historicals and Tasks.
and on https://druid.apache.org/docs/latest/configuration/index.html
the doc says to use this formula for max(10, (Number of cores * 17) / 16 + 2) + 30
for historical, indexer and broker
so since our
• historical has 16 vCPUs it comes out to be 49 and for
• query nodes has 4 vCPUs it comes out to be 40Sergio Ferragut
05/16/2023, 6:17 PMNeelesh Sharma
05/17/2023, 2:48 AMSergio Ferragut
05/17/2023, 6:03 PMNeelesh Sharma
06/12/2023, 2:40 AMSergio Ferragut
06/12/2023, 10:47 PM