https://linen.dev logo
Join Slack
Powered by
# troubleshooting
  • m

    Milad

    09/04/2025, 9:24 PM
    Hello: Is it possible to load a lookup from azure blob storage? I've tried enabling the druid-azure-extensions extension and have configured my account and key, but i get this error when I try to load a namespace:
    Copy code
    org.apache.druid.java.util.common.IAE: Unknown loader type[azureStorage].  Known types are [hdfs, file]
    with this configuration:
    Copy code
    "customer-id-to-customer": {
                "version": "v4",
                "lookupExtractorFactory": {
                    "type": "cachedNamespace",
                    "extractionNamespace": {
                        "type": "uri",
                        "uri": "azureStorage://<account>/<container>/<file>.csv",
                        "namespaceParseSpec": {
                            "format": "csv",
                            "hasHeaderRow": true,
                            "columns": ["Customer ID", "Data"]
                        },
                        "pollPeriod": "PT1M"
                    }
                }
    b
    • 2
    • 1
  • e

    Eyal Yurman

    09/11/2025, 6:18 PM
    I noticed kafka_index tasks are failing, and there is an error in the log, but the log ends with a success message. Any suggestion on how to further debug? task status (web console)
    Copy code
    {
      "id": "index_kafka_mydatasource_8283b64bc55e2b6_pcaifgan",
      "groupId": "index_kafka_mydatasource",
      "type": "index_kafka",
      "createdTime": "2025-09-11T05:31:27.698Z",
      "queueInsertionTime": "1970-01-01T00:00:00.000Z",
      "statusCode": "FAILED",
      "status": "FAILED",
      "runnerStatusCode": "WAITING",
      "duration": -1,
      "location": {
        "host": "druid-my-cluster-prod-middlemanagers-37.druid-my-cluster-prod-middlemanagers.my-cluster-prod.svc.cluster.local",
        "port": 8100,
        "tlsPort": -1
      },
      "dataSource": "mydatasource",
      "errorMsg": "No task in the corresponding pending completion taskGroup[17] succeeded before completion timeout elapsed"
    }
    task log
    Copy code
    2025-09-11T06:01:30,682 INFO [Thread-57] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [READING])
    2025-09-11T06:01:30,729 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.
    java.lang.InterruptedException: null
    	at java.base/java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly(ReentrantLock.java:159) ~[?:?]
    	at java.base/java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:372) ~[?:?]
    	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.possiblyPause(SeekableStreamIndexTaskRunner.java:1375) ~[druid-indexing-service-33.0.0.jar:33.0.0]
    	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:607) [druid-indexing-service-33.0.0.jar:33.0.0]
    	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:295) [druid-indexing-service-33.0.0.jar:33.0.0]
    	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:152) [druid-indexing-service-33.0.0.jar:33.0.0]
    	at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-33.0.0.jar:33.0.0]
    	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) [druid-indexing-service-33.0.0.jar:33.0.0]
    	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) [druid-indexing-service-33.0.0.jar:33.0.0]
    	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
    	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
    	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
    <....>
    <....>
    <....>
    2025-09-11T06:01:31,708 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
      "id" : "index_kafka_mydatasource_8283b64bc55e2b6_pcaifgan",
      "status" : "SUCCESS",
      "duration" : 1798218,
      "errorMsg" : null,
      "location" : {
        "host" : null,
        "port" : -1,
        "tlsPort" : -1
      }
    }
    2025-09-11T06:01:31,712 INFO [main] org.apache.druid.cli.CliPeon - Thread [Thread[Thread-57,5,main]] is non daemon.
    2025-09-11T06:01:31,712 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] already stopped and stop was called. Silently skipping
    Cannot remove shutdown hook, already shutting down!
    Finished peon task
    2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopping.
    2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopped.
    2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopping.
    2025-09-11T06:01:31,716 INFO [Thread-57] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopped.
    2025-09-11T06:01:31,718 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.
    2025-09-11T06:01:31,719 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
    2025-09-11T06:01:31,823 INFO [Thread-57] org.apache.zookeeper.ZooKeeper - Session: 0x30062850e464b21 closed
    2025-09-11T06:01:31,823 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x30062850e464b21
    2025-09-11T06:01:31,829 INFO [Thread-57] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
    b
    • 2
    • 1
  • s

    Sachin G

    09/11/2025, 6:38 PM
    Team, does anyone have Git location of Docker/Container Images used by Imply Druid ?
    👀 1
  • r

    Rajesh Gottapu

    09/13/2025, 3:29 AM
    Hi Everyone We need help to adjust runtime and memory parameters for historical nodes. We are running historical nodes on AWS i7ie.2xlarge (64GB RAM + 5TB HDD) instances. Current historical runtime.properties:
    Copy code
    # HTTP server threads
    druid.server.http.numThreads=41
    
    # Processing threads and buffers
    druid.processing.buffer.sizeBytes=536870912 #0.5GB
    druid.processing.numThreads=7
    druid.processing.numMergeBuffers=2
    
    # Segment storage
    druid.segmentCache.locations=[{"path":"/data/druid/segment-cache","maxSize":5000000000000}]
    druid.server.maxSize=5000000000000 #5TB
    
    # Query cache
    # For performance reasons, QueryCache is enabled on Historical nodes instead of Brokers
    druid.historical.cache.useCache=true
    druid.historical.cache.populateCache=true
    druid.cache.type=caffeine
    druid.cache.sizeInBytes=8589934592 #8.5GB
    druid.segmentCache.numLoadingThreads=5
    druid.segmentCache.numBootstrapThreads=5
    jvm:
    Copy code
    -server
    -Xms20g
    -Xmx20g
    -XX:+UseG1GC
    -XX:InitiatingHeapOccupancyPercent=40
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:HeapDumpPath=/opt/apche_druid/heapdumps
    -XX:+PrintGCDetails
    -XX:+PrintGCDateStamps
    -XX:+PrintGCApplicationStoppedTime
    -XX:MaxDirectMemorySize=30g
    Using these settings above historical node crashes from time to time with OOM: # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory. and eventually getting OutOfMemory: Meatspace error There is no proper documentation to configure the memory settings for druid historical. Can someone please suggest the recommended configuration for i7ie.2xlarge historical instance type?
    j
    k
    • 3
    • 6
  • a

    A.Iswariya

    09/15/2025, 12:06 PM
    Hi team, I’m exploring security features in Druid and would like to achieve row-level security. The goal is for users to be able to see only specific rows or certain columns when querying from the Druid shell. Could you please guide me on what configurations or approaches I should consider in Druid to make this possible? Any pointers would be greatly appreciated.
    j
    • 2
    • 2
  • d

    Daniel Müller

    09/17/2025, 12:50 PM
    Hi everyone. We have a cluster with PostgreSQL as meta store and Ceph/S3 as deep storage. The Ceph bucket is used exclusively for Druid. Few days ago we noticed our Ceph quota is almost reached (2.8TB out of 3TB). But the problem was that druid web panel and prometheus exporter showed only 20% of the quota was filled (600GB out of 3TB). We checked the ceph bucket and looks like there are about 2TB of index.zip files that are out of our
    dropBeforeByPeriod
    (PT25D) +
    druid.coordinator.kill.bufferPeriod
    (PT1H). And there are no traces of them in the PostgreSQL meta store. For example there are index files for 2024-12-16:
    2024-12-17 16:01   133M  <s3://druid-storage/druid/segments/events/2024-12-16T00:00:00.000Z_2024-12-17T00:00:00.000Z/2024-12-17T15:43:26.113Z/5/index.zip>
    but the earliest segment in the meta store's druid_segments table is 2025-07-21. Note that the datasource drop retention is 25 days. We had automatic kill task from the get go and I also run manual kill tasks. Here are the kill configs:
    Copy code
    druid.coordinator.kill.on=true
    druid.coordinator.kill.period=PT20M
    druid.coordinator.kill.durationToRetain=PT1H
    druid.coordinator.kill.maxSegments=1000
    druid.coordinator.kill.ignoreDurationToRetain=true
    druid.coordinator.kill.bufferPeriod=PT1H
    
    druid.kill.taskStatusCheckPeriod=PT1M
    druid.kill.maxRetries=3
    druid.kill.taskTimeout=PT2H
    druid.coordinator.kill.maxRunningTasks=20
    druid.coordinator.kill.segmentsPerSecond=75
    druid.coordinator.kill.taskPriority=10
    I have two questions: 1. Is it safe to delete these orphaned segments as a temporary approach? We are kind of in a disaster. 2. How can I troubleshoot this issue? (edited)
    b
    j
    • 3
    • 7
  • s

    Sachin G

    09/17/2025, 3:50 PM
    Hello Team, has anyone used Env Variables for authentication to Schema Registry in Kafka ingestion job , something like below ?
  • s

    Sachin G

    09/17/2025, 3:50 PM
    i tried below 2 options but none of them working
  • s

    Sachin G

    09/17/2025, 3:50 PM
    Copy code
    "avroBytesDecoder": {
      "type": "schema_registry",
      "url": "https://<schema-registry-url>",
      "druid.dynamic.config.provider": {
        "type": "environment",
        "variables": {
          "BASIC_AUTH_USER_INFO": "SCHEMA_REGISTRY_AUTH"
        }
      },
      "config": {
        "basic.auth.credentials.source": "USER_INFO",
        "basic.auth.user.info": "${BASIC_AUTH_USER_INFO}"
      }
    }
    
    
    "avroBytesDecoder": {
      "type": "schema_registry",
      "url": "https://<schema-registry-url>",
      "config": {
        "basic.auth.credentials.source": "USER_INFO",
        "basic.auth.user.info": {
          "type": "environment",
          "variable": "SCHEMA_REGISTRY_AUTH"
        }
      }
    }
  • s

    Sachin G

    09/17/2025, 3:51 PM
    when i hardcoded the credentials , the job is working
  • s

    Sachin G

    09/17/2025, 4:39 PM
    @Hellmar Becker any suggestion here..
  • m

    Mike Berg

    09/18/2025, 3:06 PM
    Hello! I have what I hope is a simple issue to remedy on
    druid-29
    . We have an orphaned
    index_kafka
    task-lock in the metastore (
    druid_tasklock
    ) from last week. It causes a problem for a restatement with the interval overlap. And the job as the
    index_hadoop
    job is lower priority than the kafka-supervisors running, so it just spins and fails. The orphaned
    tasklock
    is from failed
    index_kafka_task
    that doesn't look to have fully cleaned-up. There is no associated
    task
    in the
    druid_tasks
    table, so that appears clean. So, from what I see, it seems like I should just be able to delete the record from the metastore via
    psql
    query. The question is, am I able to just delete that record and observe for any failures? Or should one stop all ingestion/suspend-supervisor to the datasource, then clean-up the metadata, and finally resume the supervisor? Also, I wouldn't think that there is some memory residue in the Overlord tracking this, but just seeing what others have experienced.
    a
    • 2
    • 5
  • j

    J K

    09/18/2025, 5:22 PM
    I have made an upgrade of druid version from 27 to 28 and now i am getting BOM values for the response in grafana while i am adding link with the value i am getting from druid query. I can see the BOM value due to url encode and in ui or in druid query ui i cannot find it. Is it known issue? Or any fixes there in druid for this?
  • a

    Akaash B

    09/18/2025, 7:02 PM
    Druid tasks have been failing for too long and the mysql table is now having data in gbs.. How to do a manual cleanup
  • s

    Sachin G

    09/19/2025, 3:38 AM
    u can check druid_segments table
  • s

    Sachin G

    09/19/2025, 3:42 AM
    check retention of your tables and metadata of segments of tables with "used" flag = 0 are eligible to cleanup,
  • s

    Sachin G

    09/19/2025, 3:42 AM
    Copy code
    Select substr(start,1,7) mon, count(*) from druid_segments where datasource = 'xxx' and used = 0 group by 1;
  • a

    Akaash B

    09/19/2025, 6:41 AM
    druid_tasks | 41988.47 There is no retention enabled for druid tasks and for tasks mysql cleanup is there way to do it.. That was my query
    j
    • 2
    • 1
  • a

    Akaash B

    09/19/2025, 6:41 AM
    @Sachin G
  • e

    Eyal Yurman

    09/22/2025, 8:22 PM
    Had anyone seen OOM issues when building many HLL sketch metrics with streaming ingestion, and was able to resolve? If so, please take a look at Frequent OutOfMemoryError failures with Kafka ingestion when building multiple HLLSketchBuild metrics #18560
    b
    a
    • 3
    • 3
  • s

    Sachin G

    09/25/2025, 4:30 AM
    has anyone used env variables in user-init script ? URL
  • s

    Sachin G

    09/25/2025, 4:30 AM
    for example ( password is dummy)
  • s

    Sachin G

    09/25/2025, 4:31 AM
    Copy code
    !/bin/bash
    # First find the Imply init script 
    RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist)
    # and add the desired environment variable before starting the Imply processes
    sed -i '/^exec.*/i export\ KAFKA_JAAS_CONFIG="org.apache.kafka.common.security.plain.PlainLoginModule  required username='\'123434\'' password='\'123\+abcdee\'';"' ${RUNDRUID}
  • s

    Sachin G

    09/25/2025, 4:32 AM
    when i use this hard coded credentials in the user-init script, i am able to use KAFKA_JAAS_CONFIG as a variable in my Kafka Ingestion job
  • s

    Sachin G

    09/25/2025, 4:33 AM
    but instead of hard coded credentials i want to use env variables, something like below ( i tried various scripts but no luck so far, this is just 1 example)
  • s

    Sachin G

    09/25/2025, 4:33 AM
    Copy code
    #!/bin/bash
    
    RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist | head -n 1)
    if [ -z "$RUNDRUID" ]; then
      echo "run-druid script not found."
      exit 1
    fi
    
    
    sed -i "/^exec.*/i export KAFKA_JAAS_CONFIG=\"org.apache.kafka.common.security.plain.PlainLoginModule required username='${USERNAME}' password='${PASSWORD}';\"" "$RUNDRUID"
  • s

    Sachin G

    09/25/2025, 4:33 AM
    Note: Druid is running on Kubernetes (EKS)
  • s

    Sachin G

    09/25/2025, 4:33 AM
    sample snippet of Kafka job with this variable is as below
  • s

    Sachin G

    09/25/2025, 4:35 AM
    image.png
  • s

    Sachin G

    09/25/2025, 4:36 AM
    i have defined these variables (Username and Password) in Imply Mgr Pod, and also Druid Pods .. restarted the cluster
    i
    • 2
    • 29
1...4950515253Latest