This message was deleted.
# troubleshooting
s
This message was deleted.
j
my broker runtime.properties
Copy code
druid.service=druid/broker
        # HTTP server threads
        druid.broker.http.numConnections=20
        druid.broker.http.numMaxThreads=40
        druid.server.http.numThreads=40
        # Processing threads and buffers
        druid.processing.buffer.sizeBytes=2000000000
        druid.query.groupBy.maxOnDiskStorage=10000000000
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=3
        druid.sql.enable=true
        druid.broker.http.maxQueuedBytes=1000000000
my historical runtime.properties
Copy code
druid.service=druid/historical
        
        druid.server.http.numThreads=40
        druid.processing.buffer.sizeBytes=1000000000
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=5
        druid.query.groupBy.maxOnDiskStorage=10000000000
        # Segment storage
        druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":193273528320}]
        druid.server.maxSize=193273528320
        
        druid.cache.sizeInBytes=2000000000
        druid.cache.type=caffeine
and I have two datasources and I didn’t set rollup.
s
Just some thoughts: how many brokers? with 10 req/s with 5s response time, the requests could be piling up to more than 50 concurrent requests, saturating the http.numThreads, causing the broker to wait before it can send more requests. What are the queries? GroupBy queries require the use of merge buffers, increasing that a bit could help. See here for details.
j
I have just one broker. Actually, in the performance test, there were two types of query. First, just count(*) with simple condition(where phrase). And this query was executed 10 RPS(requests per second). Second, groupBy(PT20M granularity) query with another datasource. And this run four times a minute, but all at once. ---------------------- So, I think I can try to have more brokers and increase merge buffers. But I have a question. Is it common to see high CPU usage on broker nodes or historical nodes that receive a lot of requests and process a lot of data? I ask this because my Druid cluster’s CPU usage has never been that high, even when I run performance tests.
+ The most curious thing is that the timeout error is intermittent, not constant.
s
One of the things to look out for is high cardinality group by. The broker resolves the final merge/aggregation of results in a single thread on the broker causing a bottleneck if the number of unique values is too large. This might show up as only one cpu being busy on the broker.
r
@Jo you should change the number of the numThreads and numConnections
[each historical will connect back to the brorker, so you need to multiply each connection to the broker by the number of the historicals]