This message was deleted Apache Druid #troubleshooting

Join Slack

This message was deleted.

# troubleshooting

Slackbot

06/02/2023, 4:52 AM

This message was deleted.

06/02/2023, 4:55 AM

my broker runtime.properties

Copy code

druid.service=druid/broker
        # HTTP server threads
        druid.broker.http.numConnections=20
        druid.broker.http.numMaxThreads=40
        druid.server.http.numThreads=40
        # Processing threads and buffers
        druid.processing.buffer.sizeBytes=2000000000
        druid.query.groupBy.maxOnDiskStorage=10000000000
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=3
        druid.sql.enable=true
        druid.broker.http.maxQueuedBytes=1000000000

06/02/2023, 4:55 AM

my historical runtime.properties

Copy code

druid.service=druid/historical
        
        druid.server.http.numThreads=40
        druid.processing.buffer.sizeBytes=1000000000
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=5
        druid.query.groupBy.maxOnDiskStorage=10000000000
        # Segment storage
        druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":193273528320}]
        druid.server.maxSize=193273528320
        
        druid.cache.sizeInBytes=2000000000
        druid.cache.type=caffeine

06/02/2023, 4:59 AM

and I have two datasources and I didn’t set rollup.

Sergio Ferragut

06/02/2023, 7:33 PM

Just some thoughts: how many brokers? with 10 req/s with 5s response time, the requests could be piling up to more than 50 concurrent requests, saturating the http.numThreads, causing the broker to wait before it can send more requests. What are the queries? GroupBy queries require the use of merge buffers, increasing that a bit could help. See here for details.

06/03/2023, 1:19 PM

I have just one broker. Actually, in the performance test, there were two types of query. First, just count(*) with simple condition(where phrase). And this query was executed 10 RPS(requests per second). Second, groupBy(PT20M granularity) query with another datasource. And this run four times a minute, but all at once. ---------------------- So, I think I can try to have more brokers and increase merge buffers. But I have a question. Is it common to see high CPU usage on broker nodes or historical nodes that receive a lot of requests and process a lot of data? I ask this because my Druid cluster’s CPU usage has never been that high, even when I run performance tests.

06/03/2023, 1:21 PM

+ The most curious thing is that the timeout error is intermittent, not constant.

Sergio Ferragut

06/06/2023, 12:58 AM

One of the things to look out for is high cardinality group by. The broker resolves the final merge/aggregation of results in a single thread on the broker causing a bottleneck if the number of unique values is too large. This might show up as only one cpu being busy on the broker.

Renato Santos

06/06/2023, 11:04 AM

@Jo you should change the number of the numThreads and numConnections

Renato Santos

06/06/2023, 11:04 AM

checkout https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#connection-pool-sizing-1

Renato Santos

06/06/2023, 11:06 AM

[each historical will connect back to the brorker, so you need to multiply each connection to the broker by the number of the historicals]

Open in Slack

Previous Next