This message was deleted.
# dev
s
This message was deleted.
t
Query
Copy code
SELECT user_id
FROM "event"
WHERE  __time > '2022-07-01T00:00:00Z' AND __time < '2022-08-31T00:00:00Z'
AND event_name = 'view'
GROUP BY user_id
HAVING COUNT(*) > 5000
Profile result
Related function calls: RowBasedGrouperHelper accumulator calls grouper.aggregate(new RowBasedKey(key)) https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L342 Grouper caluculate hashcode. https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/Grouper.java#L82 Then RowBasedKey.hashCode() is called. Arrays.hashCode(key) call each element's hashcode() internaly, then string instance hashcode must be cached here. https://github.com/takaaki7/druid/blob/9d92a663f8e3964cf23d93259f52d6fb9137d5b9/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L708 And key instance is passed to DynamicDictionaryStringRowBasedKeySerdeHelper.addToDictionary without no instance copy. (ConcurrentGrouper.aggregate() -> SpillingGrouper.aggregate() -> AbstractBufferHashGrouper.aggregate() -> RowBasedKeySerde.toByteBuffer() -> DynamicDictionaryStringRowBasedKeySerdeHelper.addToDictionary()) https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L1713
a
I am not sure, but depending upon the cardinality of your user_id column, that might be the cost even after caching. How many rows did your query run on and how many user ids were in that row
t
Total rows is 1200m, user_id cardinality is about 20m.
Using 24 core and 23 threads, 80RAM.