Milad
09/04/2025, 9:24 PMorg.apache.druid.java.util.common.IAE: Unknown loader type[azureStorage]. Known types are [hdfs, file]
with this configuration:
"customer-id-to-customer": {
"version": "v4",
"lookupExtractorFactory": {
"type": "cachedNamespace",
"extractionNamespace": {
"type": "uri",
"uri": "azureStorage://<account>/<container>/<file>.csv",
"namespaceParseSpec": {
"format": "csv",
"hasHeaderRow": true,
"columns": ["Customer ID", "Data"]
},
"pollPeriod": "PT1M"
}
}Eyal Yurman
09/11/2025, 6:18 PM{
"id": "index_kafka_mydatasource_8283b64bc55e2b6_pcaifgan",
"groupId": "index_kafka_mydatasource",
"type": "index_kafka",
"createdTime": "2025-09-11T05:31:27.698Z",
"queueInsertionTime": "1970-01-01T00:00:00.000Z",
"statusCode": "FAILED",
"status": "FAILED",
"runnerStatusCode": "WAITING",
"duration": -1,
"location": {
"host": "druid-my-cluster-prod-middlemanagers-37.druid-my-cluster-prod-middlemanagers.my-cluster-prod.svc.cluster.local",
"port": 8100,
"tlsPort": -1
},
"dataSource": "mydatasource",
"errorMsg": "No task in the corresponding pending completion taskGroup[17] succeeded before completion timeout elapsed"
}
task log
2025-09-11T06:01:30,682 INFO [Thread-57] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [READING])
2025-09-11T06:01:30,729 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.
java.lang.InterruptedException: null
at java.base/java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly(ReentrantLock.java:159) ~[?:?]
at java.base/java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:372) ~[?:?]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.possiblyPause(SeekableStreamIndexTaskRunner.java:1375) ~[druid-indexing-service-33.0.0.jar:33.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:607) [druid-indexing-service-33.0.0.jar:33.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:295) [druid-indexing-service-33.0.0.jar:33.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:152) [druid-indexing-service-33.0.0.jar:33.0.0]
at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-33.0.0.jar:33.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) [druid-indexing-service-33.0.0.jar:33.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) [druid-indexing-service-33.0.0.jar:33.0.0]
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
<....>
<....>
<....>
2025-09-11T06:01:31,708 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_kafka_mydatasource_8283b64bc55e2b6_pcaifgan",
"status" : "SUCCESS",
"duration" : 1798218,
"errorMsg" : null,
"location" : {
"host" : null,
"port" : -1,
"tlsPort" : -1
}
}
2025-09-11T06:01:31,712 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[Thread-57,5,main]] is non daemon.
2025-09-11T06:01:31,712 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] already stopped and stop was called. Silently skipping
Cannot remove shutdown hook, already shutting down!
Finished peon task
2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopping.
2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopped.
2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopping.
2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopped.
2025-09-11T06:01:31,718 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.
2025-09-11T06:01:31,719 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2025-09-11T06:01:31,823 INFO [Thread-57] org.apache.zookeeper.ZooKeeper - Session: 0x30062850e464b21 closed
2025-09-11T06:01:31,823 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x30062850e464b21
2025-09-11T06:01:31,829 INFO [Thread-57] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]Sachin G
09/11/2025, 6:38 PMRajesh Gottapu
09/13/2025, 3:29 AM# HTTP server threads
druid.server.http.numThreads=41
# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912 #0.5GB
druid.processing.numThreads=7
druid.processing.numMergeBuffers=2
# Segment storage
druid.segmentCache.locations=[{"path":"/data/druid/segment-cache","maxSize":5000000000000}]
druid.server.maxSize=5000000000000 #5TB
# Query cache
# For performance reasons, QueryCache is enabled on Historical nodes instead of Brokers
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=8589934592 #8.5GB
druid.segmentCache.numLoadingThreads=5
druid.segmentCache.numBootstrapThreads=5
jvm:
-server
-Xms20g
-Xmx20g
-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=40
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/apche_druid/heapdumps
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCApplicationStoppedTime
-XX:MaxDirectMemorySize=30g
Using these settings above historical node crashes from time to time with OOM:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
and eventually getting OutOfMemory: Meatspace error
There is no proper documentation to configure the memory settings for druid historical. Can someone please suggest the recommended configuration for i7ie.2xlarge historical instance type?A.Iswariya
09/15/2025, 12:06 PMDaniel Müller
09/17/2025, 12:50 PMdropBeforeByPeriod (PT25D) + druid.coordinator.kill.bufferPeriod (PT1H). And there are no traces of them in the PostgreSQL meta store.
For example there are index files for 2024-12-16:
2024-12-17 16:01 133M <s3://druid-storage/druid/segments/events/2024-12-16T00:00:00.000Z_2024-12-17T00:00:00.000Z/2024-12-17T15:43:26.113Z/5/index.zip>
but the earliest segment in the meta store's druid_segments table is 2025-07-21. Note that the datasource drop retention is 25 days.
We had automatic kill task from the get go and I also run manual kill tasks. Here are the kill configs:
druid.coordinator.kill.on=true
druid.coordinator.kill.period=PT20M
druid.coordinator.kill.durationToRetain=PT1H
druid.coordinator.kill.maxSegments=1000
druid.coordinator.kill.ignoreDurationToRetain=true
druid.coordinator.kill.bufferPeriod=PT1H
druid.kill.taskStatusCheckPeriod=PT1M
druid.kill.maxRetries=3
druid.kill.taskTimeout=PT2H
druid.coordinator.kill.maxRunningTasks=20
druid.coordinator.kill.segmentsPerSecond=75
druid.coordinator.kill.taskPriority=10
I have two questions:
1. Is it safe to delete these orphaned segments as a temporary approach? We are kind of in a disaster.
2. How can I troubleshoot this issue? (edited)Sachin G
09/17/2025, 3:50 PMSachin G
09/17/2025, 3:50 PMSachin G
09/17/2025, 3:50 PM"avroBytesDecoder": {
"type": "schema_registry",
"url": "https://<schema-registry-url>",
"druid.dynamic.config.provider": {
"type": "environment",
"variables": {
"BASIC_AUTH_USER_INFO": "SCHEMA_REGISTRY_AUTH"
}
},
"config": {
"basic.auth.credentials.source": "USER_INFO",
"basic.auth.user.info": "${BASIC_AUTH_USER_INFO}"
}
}
"avroBytesDecoder": {
"type": "schema_registry",
"url": "https://<schema-registry-url>",
"config": {
"basic.auth.credentials.source": "USER_INFO",
"basic.auth.user.info": {
"type": "environment",
"variable": "SCHEMA_REGISTRY_AUTH"
}
}
}Sachin G
09/17/2025, 3:51 PMSachin G
09/17/2025, 4:39 PMMike Berg
09/18/2025, 3:06 PMdruid-29. We have an orphaned index_kafka task-lock in the metastore (druid_tasklock ) from last week. It causes a problem for a restatement with the interval overlap. And the job as the index_hadoop job is lower priority than the kafka-supervisors running, so it just spins and fails. The orphaned tasklock is from failed index_kafka_task that doesn't look to have fully cleaned-up. There is no associated task in the druid_tasks table, so that appears clean.
So, from what I see, it seems like I should just be able to delete the record from the metastore via psql query. The question is, am I able to just delete that record and observe for any failures? Or should one stop all ingestion/suspend-supervisor to the datasource, then clean-up the metadata, and finally resume the supervisor? Also, I wouldn't think that there is some memory residue in the Overlord tracking this, but just seeing what others have experienced.J K
09/18/2025, 5:22 PMAkaash B
09/18/2025, 7:02 PMSachin G
09/19/2025, 3:38 AMSachin G
09/19/2025, 3:42 AMSachin G
09/19/2025, 3:42 AMSelect substr(start,1,7) mon, count(*) from druid_segments where datasource = 'xxx' and used = 0 group by 1;Akaash B
09/19/2025, 6:41 AMAkaash B
09/19/2025, 6:41 AMEyal Yurman
09/22/2025, 8:22 PMSachin G
09/25/2025, 4:30 AMSachin G
09/25/2025, 4:30 AMSachin G
09/25/2025, 4:31 AM!/bin/bash
# First find the Imply init script
RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist)
# and add the desired environment variable before starting the Imply processes
sed -i '/^exec.*/i export\ KAFKA_JAAS_CONFIG="org.apache.kafka.common.security.plain.PlainLoginModule required username='\'123434\'' password='\'123\+abcdee\'';"' ${RUNDRUID}Sachin G
09/25/2025, 4:32 AMSachin G
09/25/2025, 4:33 AMSachin G
09/25/2025, 4:33 AM#!/bin/bash
RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist | head -n 1)
if [ -z "$RUNDRUID" ]; then
echo "run-druid script not found."
exit 1
fi
sed -i "/^exec.*/i export KAFKA_JAAS_CONFIG=\"org.apache.kafka.common.security.plain.PlainLoginModule required username='${USERNAME}' password='${PASSWORD}';\"" "$RUNDRUID"Sachin G
09/25/2025, 4:33 AMSachin G
09/25/2025, 4:33 AMSachin G
09/25/2025, 4:35 AMSachin G
09/25/2025, 4:36 AM