This message was deleted Apache Druid #dev

Join Slack

This message was deleted.

# dev

Slackbot

10/23/2023, 6:10 AM

This message was deleted.

Takaaki Nakama

10/23/2023, 6:10 AM

Query

Copy code

SELECT user_id
FROM "event"
WHERE  __time > '2022-07-01T00:00:00Z' AND __time < '2022-08-31T00:00:00Z'
AND event_name = 'view'
GROUP BY user_id
HAVING COUNT(*) > 5000

Takaaki Nakama

10/23/2023, 6:10 AM

Profile result

Takaaki Nakama

10/23/2023, 6:11 AM

Related function calls: RowBasedGrouperHelper accumulator calls grouper.aggregate(new RowBasedKey(key)) https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L342 Grouper caluculate hashcode. https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/Grouper.java#L82 Then RowBasedKey.hashCode() is called. Arrays.hashCode(key) call each element's hashcode() internaly, then string instance hashcode must be cached here. https://github.com/takaaki7/druid/blob/9d92a663f8e3964cf23d93259f52d6fb9137d5b9/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L708 And key instance is passed to DynamicDictionaryStringRowBasedKeySerdeHelper.addToDictionary without no instance copy. (ConcurrentGrouper.aggregate() -> SpillingGrouper.aggregate() -> AbstractBufferHashGrouper.aggregate() -> RowBasedKeySerde.toByteBuffer() -> DynamicDictionaryStringRowBasedKeySerdeHelper.addToDictionary()) https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L1713

Abhishek Agarwal

10/23/2023, 11:30 AM

I am not sure, but depending upon the cardinality of your user_id column, that might be the cost even after caching. How many rows did your query run on and how many user ids were in that row

Takaaki Nakama

10/23/2023, 2:16 PM

Total rows is 1200m, user_id cardinality is about 20m.

Takaaki Nakama

10/23/2023, 3:28 PM

Using 24 core and 23 threads, 80RAM.

Takaaki Nakama

10/24/2023, 1:06 PM

I've created github issue. https://github.com/apache/druid/issues/15242

2 Views

Open in Slack

Previous Next