Jvalant Patel
05/14/2025, 10:42 PMnull
to the latest Druid version where legacy mode is not supported, just wanted to get some help from here to know what should be the best strategy to upgrade druid if we have null
and ""
strings in the datasources and our queries rely on the legacy behavior. If we want to rewrite queries to handle three valued logic for null
comparisons, what should be the strategy ? is there any generalized way to modify the queries ? we are still using native Druid query language.Rohen
05/19/2025, 1:14 PMRohen
05/19/2025, 1:14 PMUdit Sharma
05/19/2025, 1:43 PMselect distinct customer from events where __time BETWEEN TIMESTAMP '2025-03-20 12:30:00'
AND TIMESTAMP '2025-05-19 13:00:00' AND
customer IN (
'2140', '1060', '2207', '1809', '2985',
'3026', '2947', '2955', '2367', '2464',
'899', '355', '3284', '3302', '1034',
'3015', '2127', '2123', '2731', '2109',
'2832', '2479', '2702', '2387', '1804',
'1018', '1364', '3467', '1028', '850'
)
While this seems to return the right results.
select distinct custId from events where __time BETWEEN TIMESTAMP '2025-03-20 12:30:00'
AND TIMESTAMP '2025-05-19 13:00:00' AND
custId IN (
'2140', '1060', '2207', '1809', '2985',
'3026', '2947', '2955', '2367', '2464',
'899', '355', '3284', '3302', '1034',
'3015', '2127', '2123', '2731', '2109',
'2832', '2479', '2702', '2387', '1804',
'1018', '1364', '3467', '1028', '850'
)
Druid Version : 26.0.0JRob
05/22/2025, 5:53 PMCannot construct instance of `org.apache.druid.data.input.protobuf.FileBasedProtobufBytesDecoder`, problem: Cannot read descriptor file: file:/tmp/metrics.desc at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 1090] (through reference chain: org.apache.druid.indexing.kafka.KafkaSamplerSpec["spec"]->org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpec["ioConfig"]->org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorIOConfig["inputFormat"]->org.apache.druid.data.input.protobuf.ProtobufInputFormat["protoBytesDecoder"])
I suspect that Druid is trying to download the filer over HTTP but we would never expose /tmp
to the internet. Why doesn't it just grab the file locally?
For example, this works:
{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "local",
"baseDir": "/tmp/",
"filter": "metrics.desc"
}
},
"tuningConfig": {
"type": "index_parallel"
}
}
}
However, I can't get this working with inputFormat
Utkarsh Chaturvedi
05/23/2025, 10:13 AMBrindha Ramasamy
05/23/2025, 6:30 PMRohen
05/26/2025, 1:44 PMJRob
05/28/2025, 9:14 PM1) No implementation for org.apache.druid.server.metrics.TaskCountStatsProvider was bound.
while locating org.apache.druid.server.metrics.TaskCountStatsProvider
for the 1st parameter of org.apache.druid.server.metrics.TaskCountStatsMonitor.<init>(TaskCountStatsMonitor.java:40)
Hardik Bajaj
05/29/2025, 7:12 PMSeki Inoue
06/02/2025, 4:47 PM.coordinator-issued_kil...
had 265 bytes length and it exceeds the XFS limit of 255 bytes.
Do you know any work around to forcibly kill those segments?
2025-05-30T22:10:42,465 ERROR [qtp214761486-125] org.apache.druid.indexing.worker.WorkerTaskManager - Error while trying to persist assigned task[coordinator-issued_kill_<deducted_long_datasource_name_119_bytes>]
java.nio.file.FileSystemException: var/tmp/persistent/task/workerTaskManagerTmp/.coordinator-issued_kill_<deducted_long_datasource_name_119_bytes>_dfhlgdae_2024-07-10T23:00:00.000Z_2024-07-18T00:00:00.000Z_2025-05-30T22:10:42.417Z.2aababbd-02a6-4002-9b9f-cba30bbea8a7: File name too long
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181) ~[?:?]
at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
at java.base/java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
at org.apache.druid.java.util.common.FileUtils.writeAtomically(FileUtils.java:271) ~[druid-processing-33.0.0.jar:33.0.0]
...
Asit
06/03/2025, 4:25 AMJon Laberge
06/04/2025, 5:35 AMkubernetes-overlord-extensions
, however I see this error when the overlord is trying to start:
Caused by: org.apache.commons.lang3.NotImplementedException: this druid.indexer.logs.type [class org.apache.druid.storage.google.GoogleTaskLogs] does not support managing task payloads yet. You will have to switch to using environment variables
Is there something I should be changing in my task template?Jimbo Slice
06/06/2025, 9:55 PMSELECT
COUNT(*) As Entries,
SUM(packets) as Packets,
SUM(bytes) as Bytes,
(SUM(bytes) / SUM(packets)) as AvgPacketSizeBytes,
MIN(__time) as FirstSeen,
MAX(__time) as LastSeen,
TIMESTAMPDIFF(SECOND, MIN(__time), MAX(__time)) as DurationSeconds,
(SUM(bytes) * 8 / TIMESTAMPDIFF(SECOND, MIN(__time), MAX(__time))) as AvgMbps,
"pkt-srcaddr", "pkt-dstaddr", "protocol"
FROM "AWSLogsVPC"
WHERE "log-status"!='NODATA' AND "pkt-srcaddr"!='-' AND "action"='ACCEPT'
GROUP BY "pkt-srcaddr", "pkt-dstaddr", "protocol"
But when i remove the TIMESTAMPDIFF section for AvgMbps this does not happen:
SELECT
COUNT(*) As Entries,
SUM(packets) as Packets,
SUM(bytes) as Bytes,
(SUM(bytes) / SUM(packets)) as AvgPacketSizeBytes,
MIN(__time) as FirstSeen,
MAX(__time) as LastSeen,
TIMESTAMPDIFF(SECOND, MIN(__time), MAX(__time)) as DurationSeconds,
(SUM(bytes) * 8) as AvgMbps,
"pkt-srcaddr", "pkt-dstaddr", "protocol"
FROM "AWSLogsVPC"
WHERE "log-status"!='NODATA' AND "pkt-srcaddr"!='-' AND "action"='ACCEPT'
GROUP BY "pkt-srcaddr", "pkt-dstaddr", "protocol"
I've tried removing the "WHERE" because != is bad practice, no difference, I believe there is an issue here with subquerying (druid.server.http.maxsubqueryrows) - however this is not a subquery, this is a simple calculation in a simple query.
This query runs perfectly without TIMESTAMPDIFF(SECOND, MIN(__time), MAX(__time))
being called in AvgMbps.
Any ideas on what could be wrong???Ben Krug
06/06/2025, 10:07 PMvenkat
06/07/2025, 8:50 AMvenkat
06/07/2025, 8:54 AMsandy k
06/09/2025, 4:43 AMRohen
06/09/2025, 5:53 AMRushikesh Bankar
06/09/2025, 10:32 AMJRob
06/10/2025, 3:19 PMsys.segments
are taking upwards of 60 seconds on average. Likewise our Datasources tab in the Console takes an agonizingly long time to load. But I can't understand why it's so slow, our DB stats don't show any issues.
The druid_segments
table is only 1108 MB in size.
From pg_stat_statements:
query | SELECT payload FROM druid_segments WHERE used=$1
calls | 734969
total_exec_time | 1318567198.0990858
min_exec_time | 733.308662
max_exec_time | 13879.650989
mean_exec_time | 1794.0446441947086
stddev_exec_time | 581.4299142612549
----------------------------------------------
query | SELECT payload FROM druid_segments WHERE used = $1 AND dataSource = $2 AND ((start < $3 AND "end" > $4) OR (start = $7 AND "end" != $8 AND "end" > $5) OR (start != $9 AND "end" = $10 AND start < $6) OR (start = $11 AND "end" = $12))
calls | 4888478
total_exec_time | 31912869.00381691
min_exec_time | 0.007730999999999999
max_exec_time | 2166.647028
mean_exec_time | 6.528180960171064
stddev_exec_time | 25.333075336970094
----------------------------------------------
Dinesh
06/12/2025, 4:55 AMDinesh
06/12/2025, 5:26 AMRiccardo Sale
06/16/2025, 10:11 AMdruid.audit.manager.maxPayloadSizeBytes
Looking at the coordinator.compaction.config
field we have seen that this json payload value have grown to over 30MB and it's still causing slowdown when queried.
As an example the following query: SELECT payload FROM druid_segments WHERE used=?
takes up to three second.
Any suggestion to solve the above issue ? How can we reduce the general size of the payload in the coordinator.compaction.config
? Would it be possible to write a custom extension for this specific use case ?
Thanks in advance !Rajesh Gottapu
06/17/2025, 5:19 AMNir Bar On
06/17/2025, 11:06 AMNir Bar On
06/17/2025, 12:30 PMNir Bar On
06/17/2025, 12:48 PMCristi Aldulea
06/18/2025, 7:46 AMingestionTimestamp
to support a deduplication job. Additionally, I have a column named tags
, which is a multi-value VARCHAR
column.
The deduplication is performed using an MSQ (Multi-Stage Query) like the following:
REPLACE INTO "target-datasource"
OVERWRITE
WHERE "__time" >= TIMESTAMP'__MIN_TIME'
AND "__time" < TIMESTAMP'__MAX_TIME'
SELECT
__time,
LATEST_BY("entityId", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) AS "entityId",
LATEST_BY("entityName", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) AS "entityName",
LATEST_BY("tagSetA", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) AS "tagSetA",
LATEST_BY("tagSetB", MILLIS_TO_TIMESTAMP("ingestionTimestamp")) AS "tagSetB",
MAX("ingestionTimestamp") AS ingestionTimestamp
FROM "target-datasource"
WHERE "__time" >= TIMESTAMP'__MIN_TIME'
AND "__time" < TIMESTAMP'__MAX_TIME'
GROUP BY
__time,
"entityUID"
PARTITIONED BY 'P1M';
Problem:
After running this query, the tags
-like columns (tagSetA
, tagSetB
) are no longer in a multi-value format. This breaks downstream queries that rely on the multi-value nature of these columns.
My understanding:
MSQ might not support preserving multi-value columns directly, especially when using functions like LATEST_BY
.
Question:
How can I run this kind of deduplication query while preserving the multi-value format of these columns? Is there a recommended approach or workaround in Druid to handle this scenario?
Can someone help us with this problem, please?Vaibhav
06/18/2025, 7:16 PMorg.apache.druid.java.util.common.IAE: Asked to add buffers[2,454,942,764] larger than configured max[2,147,483,647]
at org.apache.druid.java.util.common.io.smoosh.FileSmoosher.addWithSmooshedWriter(FileSmoosher.java:168)
• On investigation: compaction produces 430 partitions
, but the 430th partition (with end=null
) gets an unusually high number of rows (~800M+ rows).
What I found:
- A GROUP BY
on the 5 range dimensions for a sample day gives ~11.5k unique combinations
eg :
SELECT range_dim1, range_dim2, range_dim3, range_dim4, range_dim5 , count(*) as row_count
WHERE __time < 1 day interval >
GROUP BY 1,2,3,4,5
ORDER BY 1,2,3,4,5
- However, partition 430 gets all combinations from ~9.5k to ~11.5k in one partition.
- This violates the targetRowsPerSegment: 5M
and maxRowsPerSegment: 7.5M
config.
Questions:
• Are there better strategies to ensure partitioning respects row count limits ?
• Is this behavior a bug or expected ?
Any advice or insights appreciated.