Cristian Daniel Gelvis Bermudez
07/10/2025, 6:31 PMVictoria
07/11/2025, 3:24 AMsandy k
07/11/2025, 1:24 PMsandy k
07/13/2025, 8:53 AMAqsha Padyani
07/15/2025, 7:41 AMDAY
segment granularity and these example dimensions:
"dimensionsSpec": {
"dimensions": [
{"type": "string", "name": "user_id", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true},
{"type": "string", "name": "phone_number", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true},
{"type": "string", "name": "email_address", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true}
]
}
I'm trying to set up a compaction to that datasource that compacts segments into MONTH
granularity, and only store the latest entry of each customer on that month:
dimensionsSpec": {
"dimensions": [
{"type": "string", "name": "user_id", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true}
]
},
"metricsSpec": [
{"type": "stringLast", "name": "phone_number", "fieldName": "phone_number", "timeColumn": "__time", "maxStringBytes": 1024},
{"type": "stringLast", "name": "email_address", "fieldName": "email_address", "timeColumn": "__time", "maxStringBytes": 1024}
]
I found out that the metricsSpec
stores the aggregated data in a COMPLEX<serializablePairLongString>
type, which is different to the new/un-compacted data:
{
"lhs": 1721882238000,
"rhs": "+6281234567890"
}
Queries with aggregations like LATEST()
still works fine, but retrieving the data with something like SELECT *
produces an error:
Cannot coerce field [phone_number] from type [java.util.LinkedHashMap] to type [VARCHAR]
I imagine transformSpec.transforms
can be used to transform those to string, but AFAIK that config is not supported in compaction.
Is there any better implementation for this "latest entry of each customer" while keeping the data type stays the same between newly-ingested and compacted data? Or is this "the best way" to implement this, and I should change the query from SELECT *
to something else?JRob
07/15/2025, 2:56 PMrequireTimeCondition
.
Semple query:
WITH sample_data AS (
SELECT
TIME_FLOOR("__time", 'PT5M') AS time_bucket,
SUM("count") AS volume
FROM "datasource"
AND "__time" > CURRENT_TIMESTAMP - INTERVAL '1' day
GROUP BY 1
)
SELECT
time_bucket AS window_end_time,
TIME_SHIFT(time_bucket, 'PT30M', -1) AS window_start_time,
SUM(volume) OVER (
ORDER BY time_bucket
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) AS rolling_volume
FROM sample_data
I would expect that requireTimeCondition
should only apply to datasource queries and not all queries, yes? Is the solution to simply abandon requireTimeCondition
? What other guards can I put in place for bad queries?Tanay Maheshwari
07/16/2025, 6:18 AMNir Bar On
07/16/2025, 11:08 AMKonstantinos Chaitas
07/16/2025, 3:17 PMFROM TABLE(APPEND(...))
approach, but I would prefer to hide that complexity from end users. Also, some of the UI tools we are using request a single datasource as an input. Is there a way to create a view in Druid, or alternatively, to streamline the data from multiple datasources into a single, unified datasource? ThanksVictoria
07/16/2025, 5:18 PMeu-central-1
. To make it work, I had to override the aws.region=eu-central-1
via a JVM system property for all services. However, now I cannot seem ingest data from us-east-1
buckets. It throws the error
Failed to sample data: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-east-1. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect;
I tried to use the endpointConfig
in the spec, but still without success. Has anyone run into the same issue? (we're using druid 33.0.0)
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "s3",
"endpointConfig": {
"url": "<http://s3.us-east-1.amazonaws.com|s3.us-east-1.amazonaws.com>",
"signingRegion": "us-east-1"
},
"uris": [
"<s3://x-us-east-1-dev-polaris/segment_events/designer/page/data/processing_date_day=2023-01-01/event_date_day=2022-12-31/00000-306-a018ab59-9017-4b34-8a8a-858de89ee6b7-0-00002.parquet>"
]
}
Tanay Maheshwari
07/16/2025, 7:43 PM2025-07-16T19:38:26,278 WARN [qtp1182725120-124] org.apache.druid.query.lookup.LookupUtils - Lookup [os_lookup] could not be serialized properly. Please check its configuration. Error: Cannot construct instance of `org.apache.druid.query.lookup.namespace.JdbcExtractionNamespace`, problem: java.lang.ClassNotFoundException: org.postgresql.Driver
I am using "postgresql-metadata-storage" and "mysql-metadata-storage" extensions. In the postgres-metadata-storage extension I have the following jars - checker-qual-3.42.0.jar postgresql-42.7.2.jar postgresql-metadata-storage-32.0.0.jar
After checking online I also added mysql-connector-j-8.2.0.jar in mysql-metadata-storage extension folder.
Still I am getting this error. Any help in debugging would be appreciatedNir Bar On
07/17/2025, 11:23 AMTanay Maheshwari
07/18/2025, 12:18 PMjakubmatyszewski
07/21/2025, 7:50 AMdruid.server.http.numThreads=43
druid.segmentCache.numLoadingThreads=20
druid.segmentCache.numBootstrapThreads=40
I wonder whether setting this values so high makes any sense - I see for numLoadingThreads
the default is max(1,Number of cores / 6)
- in my case it is allowed to have 11 cores.
Do you have any recommendations for case like this?Eyal Yurman
07/21/2025, 10:23 PMNir Bar On
07/28/2025, 1:36 PMdruid_query_groupBy_maxResults=500000
druid_query_groupBy_maxIntermediateRows=1000000
druid_query_groupBy_maxMergingDictionarySize=268435456
what can be the cause broker is crashing .. , how can I trouble shot this to figure out what I need to do to fix it ?
can be that broker is not using the direct memory and instead keep using the heap memory ?
broker status paylodd
“memory”: {
“maxMemory”: 8589934592,
“totalMemory”: 8589934592,
“freeMemory”: 6974955008,
“usedMemory”: 1614979584,
“directMemory”: 4294967296
}Tanay Maheshwari
07/28/2025, 1:47 PM2025-07-28T12:48:46,057 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Exception while running task[AbstractTask{id='index_parallel_supply_view_dohnekmm_2025-07-28T12:48:42.004Z', groupId='index_parallel_supply_view_dohnekmm_2025-07-28T12:48:42.004Z', taskResource=TaskResource{availabilityGroup='index_parallel_supply_view_dohnekmm_2025-07-28T12:48:42.004Z', requiredCapacity=1}, dataSource='supply_view', context={forceTimeChunkLock=true, useLineageBasedSegmentAllocation=true}}]
java.lang.ClassCastException: class java.lang.Object cannot be cast to class org.apache.druid.indexing.common.task.batch.parallel.SinglePhaseParallelIndexTaskRunner (java.lang.Object is in module java.base of loader 'bootstrap'; org.apache.druid.indexing.common.task.batch.parallel.SinglePhaseParallelIndexTaskRunner is in unnamed module of loader 'app')
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.doGetRowStatsAndUnparseableEvents(ParallelIndexSupervisorTask.java:1786) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.getTaskCompletionUnparseableEvents(ParallelIndexSupervisorTask.java:1271) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.buildIngestionStatsTaskReport(AbstractBatchIndexTask.java:985) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.buildIngestionStatsAndContextReport(AbstractBatchIndexTask.java:950) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.getTaskCompletionReports(ParallelIndexSupervisorTask.java:1254) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.updateAndWriteCompletionReports(ParallelIndexSupervisorTask.java:1276) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSinglePhaseParallel(ParallelIndexSupervisorTask.java:681) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:551) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-32.0.0.jar:32.0.0]
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
This was fixed after restarting overlord. But I am unable to explain this behaviour. Is anyone aware of this type of issue?Glenn Huang
07/30/2025, 1:02 PMTaskSlotCountStatsMonitor
on the Overlord node, but I’m not seeing any metrics related to task slots or worker availability (e.g., taskSlot/total/count
, taskSlot/used/count
, etc.).
Any help is appreciated. Thanks in advance!
Environment:
• Platform: Azure AKS
• Druid Version: 31.0.2
Overlord startup log and configuration (sensitive info masked):Eyal Yurman
08/04/2025, 7:18 PMTanay Maheshwari
08/06/2025, 4:34 AMERROR [qtp1115073856-99] com.sun.jersey.spi.container.ContainerResponse - The exception contained within MappableContainerException could not be mapped to a respons
java.lang.NoClassDefFoundError: Could not initialize class com.github.luben.zstd.Zstd
at org.apache.druid.segment.data.CompressionStrategy$ZstdDecompressor.decompress(CompressionStrategy.java:425) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.data.DecompressingByteBufferObjectStrategy.fromByteBuffer(DecompressingByteBufferObjectStrategy.java:74) ~[druid-processing-32.0.0.jar:32.0.0]
Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.ExceptionInInitializerError: Cannot unpack libzstd-jni-1.5.2-3: No such file or directory [in thread "qtp1115073856-12
at java.base/java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:?]
at java.base/java.io.File.createTempFile(File.java:2170) ~[?:?]
at com.github.luben.zstd.util.Native.load(Native.java:99) ~[zstd-jni-1.5.2-3.jar:1.5.2-3]
at com.github.luben.zstd.util.Native.load(Native.java:55) ~[zstd-jni-1.5.2-3.jar:1.5.2-3]
at com.github.luben.zstd.Zstd.<clinit>(Zstd.java:13) ~[zstd-jni-1.5.2-3.jar:1.5.2-3]
at org.apache.druid.segment.data.CompressionStrategy$ZstdDecompressor.decompress(CompressionStrategy.java:425) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.data.DecompressingByteBufferObjectStrategy.fromByteBuffer(DecompressingByteBufferObjectStrategy.java:74) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.data.DecompressingByteBufferObjectStrategy.fromByteBuffer(DecompressingByteBufferObjectStrategy.java:30) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.data.GenericIndexed$BufferIndexed.get(GenericIndexed.java:593) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.data.BlockLayoutColumnarLongsSupplier$1.loadBuffer(BlockLayoutColumnarLongsSupplier.java:97) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.data.BlockLayoutColumnarLongsSupplier$1.get(BlockLayoutColumnarLongsSupplier.java:84) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.column.LongsColumn.getLongSingleValueRow(LongsColumn.java:77) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.QueryableIndexTimeBoundaryInspector.populateMinMaxTime(QueryableIndexTimeBoundaryInspector.java:91) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.QueryableIndexTimeBoundaryInspector.getMinTime(QueryableIndexTimeBoundaryInspector.java:62) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.segment.TimeBoundaryInspector.getMinMaxInterval(TimeBoundaryInspector.java:53) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.server.coordination.ServerManager.buildAndDecorateQueryRunner(ServerManager.java:304) ~[druid-server-32.0.0.jar:32.0.0]
at org.apache.druid.server.coordination.ServerManager.buildQueryRunnerForSegment(ServerManager.java:257) ~[druid-server-32.0.0.jar:32.0.0]
at org.apache.druid.server.coordination.ServerManager.lambda$getQueryRunnerForSegments$2(ServerManager.java:208) ~[druid-server-32.0.0.jar:32.0.0]
Harsha Vardhan
08/06/2025, 3:50 PM{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "inline",
"data": "time,session_id,session_duration,country,device_type,timestamp\n2025-08-01T00:00:00,session_0,37,FR,tablet,2025-08-01 00:00:00\n2025-08-01T00:01:00,session_1,240,DE,desktop,2025-08-01 00:01:00\n2025-08-01T00:02:00,session_2,105,BR,tablet,2025-08-01 00:02:00"
},
"inputFormat": {
"type": "csv",
"findColumnsFromHeader": true
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"partitionsSpec": {
"type": "hashed"
},
"forceGuaranteedRollup": true,
"totalNumMergeTasks": 1
},
"dataSchema": {
"dataSource": "buceket_testing",
"timestampSpec": {
"column": "time",
"format": "iso"
},
"dimensionsSpec": {
"dimensions": [
{
"name": "device_type",
"type": "string"
}
]
},
"granularitySpec": {
"queryGranularity": "hour",
"rollup": true,
"segmentGranularity": "hour"
},
"metricsSpec": [
{
"name": "count",
"type": "count"
},
{
"name": "sessions_bucket",
"type": "fixedBucketsHistogram",
"fieldName": "duration",
"lowerLimit": 0,
"upperLimit": 100,
"numBuckets": 10,
"outlierHandlingMode": "overflow"
},
{
"name": "theta_session_id",
"type": "thetaSketch",
"fieldName": "session_id"
}
],
"transformSpec": {
"transforms": [
{
"type": "expression",
"name": "duration",
"expression": "cast(\"session_duration\" ,'long')"
}
]
}
}
}
}
my use case is something like finding how many sessions are falling in each bucket
0-10 : 5 sessions
10-20 : 1 session ...etc
I am unable to query the datasource to achieve this.. can someone help ?Przemek
08/13/2025, 9:09 AMpartial_index_generic_merge
tasks - they are unable to load segments and I see in logs such info:
2025-08-08T15:48:18,234 WARN [Segment-Bootstrap-0] org.apache.druid.segment.loading.StorageLocation - Segment[Golf_Gold_GolfCommentary_2024-05-29T00:00:00.000Z_2024-05-30T00:00:00.000Z_2024-05-30T23:16:44.708Z:92,692] too large for storage[/opt/druid/var/tmp/persistent/task/broadcast/segments:-1]. Check your druid.segmentCache.locations maxSize param
which would mean that availableSizeBytes return -1
. I have druid.segmentCache.locations
and druid.server.maxSize
set
druid.segmentCache.locations: '[{"path":"/opt/druid/var/data/segments", "maxSize":1500000000000}]'
druid.server.maxSize: "1500000000000"
but in logs there is info that segment is too large for storage[/opt/druid/var/tmp/...
which is in historical config as
druid.processing.tmpDir: "/opt/druid/var/tmp"
How that configs are correlated?
I have also same path used for peons:
druid.indexer.fork.property.druid.processing.tmpDir: "/opt/druid/var/tmp"
druid.indexer.fork.property.druid.indexer.task.baseDir: "/opt/druid/var/tmp"
Can anybody help suggest what can be missed/misconfigured then?A.Iswariya
08/18/2025, 6:53 AMA.Iswariya
08/18/2025, 12:28 PMMateusz Kalinowski
08/18/2025, 1:45 PM{
"type": "cachedNamespace",
"extractionNamespace": {
"type": "jdbc",
"pollPeriod": "PT1H",
"connectorConfig": {
"connectURI": "jdbc:<mysql://database:3306/table>",
"user": {
"type": "environment",
"variable": "MYSQL_USERNAME"
},
"password": {
"type": "environment",
"variable": "MYSQL_PASSWORD"
}
},
"table": "Test",
"keyColumn": "id",
"valueColumn": "name"
}
}
But this gives me:
org.apache.druid.query.lookup.LookupUtils - Lookup [mk_test] could not be serialized properly. Please check its configuration. Error: Cannot deserialize value of type `java.lang.String` from Object value (token `JsonToken.START_OBJECT`)
2025-08-18 14:52:55.818
at [Source: (byte[])":)
This could mean that the configuration is incorrect. If I set the values directly, the lookup works as expected.
Will be grateful for any advice on this.Utkarsh Chaturvedi
08/19/2025, 10:17 AMPARTITIONED BY granularity.
This I figure is because the date range is split between month level segments and day level segments. So I break the ingestion into 2 : Before the month level change and after the month level change. So I run an ingestion July 25 - July 31. This works but only with DAY granularity. So this makes me uncertain about whether or not the ingestion earlier was breaking because of the underlying segment granularity.
3. Now the ingestion for July 25 - July 31. creates 7 day level segments : But they are not getting compacted : Compaction is saying 100% compacted except for last 10 days. Not seeing these uncompacted segments. Shouldn't these segements be relevant for compaction?
If anybody who understands compaction well, can help with this. Would be appreciated.Jesse Tuglu
08/21/2025, 6:54 PM5.8.0
• ZK server version = 3.5.8
• ZK client version = 3.8.4
Wondering if this ZK client/server version mismatch could be the R/C of thingsMilad
08/21/2025, 8:36 PMset resultFormat = 'csv';
it has no effect. Does anyone know if that was by design?Tanay Maheshwari
08/23/2025, 7:23 AM2025-08-23T06:46:02,369 ERROR [NamespaceExtractionCacheManager-0] org.apache.druid.server.lookup.namespace.cache.CacheScheduler - Failed to update namespace [JdbcExtractionNamespace{connec
java.lang.NoClassDefFoundError: org/postgresql/ssl/LazyKeyManager
at org.postgresql.ssl.LibPQFactory.initPk8(LibPQFactory.java:85) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.ssl.LibPQFactory.<init>(LibPQFactory.java:123) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.core.SocketFactoryFactory.getSslSocketFactory(SocketFactoryFactory.java:61) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:34) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:625) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:195) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:262) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:273) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.Driver.makeConnection(Driver.java:446) ~[postgresql-42.7.2.jar:42.7.2]
at org.postgresql.Driver.connect(Driver.java:298) ~[postgresql-42.7.2.jar:42.7.2]
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681) ~[java.sql:?]
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:229) ~[java.sql:?]
at org.skife.jdbi.v2.DBI$3.openConnection(DBI.java:140) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.DBI.open(DBI.java:212) ~[jdbi-2.63.1.jar:2.63.1]
at org.skife.jdbi.v2.DBI.withHandle(DBI.java:279) ~[jdbi-2.63.1.jar:2.63.1]
at org.apache.druid.server.lookup.namespace.JdbcCacheGenerator.lastUpdates(JdbcCacheGenerator.java:211) ~[?:?]
at org.apache.druid.server.lookup.namespace.JdbcCacheGenerator.generateCache(JdbcCacheGenerator.java:72) ~[?:?]
at org.apache.druid.server.lookup.namespace.JdbcCacheGenerator.generateCache(JdbcCacheGenerator.java:48) ~[?:?]
at org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl.tryUpdateCache(CacheScheduler.java:234) ~[?:?]
at org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl.updateCache(CacheScheduler.java:206) ~[?:?]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Thuận Trần Văn
08/28/2025, 4:33 AM