https://linen.dev logo
Join Slack
Powered by
# troubleshooting
  • a

    Akaash B

    09/19/2025, 6:41 AM
    druid_tasks | 41988.47 There is no retention enabled for druid tasks and for tasks mysql cleanup is there way to do it.. That was my query
    j
    • 2
    • 1
  • a

    Akaash B

    09/19/2025, 6:41 AM
    @Sachin G
  • e

    Eyal Yurman

    09/22/2025, 8:22 PM
    Had anyone seen OOM issues when building many HLL sketch metrics with streaming ingestion, and was able to resolve? If so, please take a look at Frequent OutOfMemoryError failures with Kafka ingestion when building multiple HLLSketchBuild metrics #18560
    b
    a
    • 3
    • 3
  • s

    Sachin G

    09/25/2025, 4:30 AM
    has anyone used env variables in user-init script ? URL
  • s

    Sachin G

    09/25/2025, 4:30 AM
    for example ( password is dummy)
  • s

    Sachin G

    09/25/2025, 4:31 AM
    Copy code
    !/bin/bash
    # First find the Imply init script 
    RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist)
    # and add the desired environment variable before starting the Imply processes
    sed -i '/^exec.*/i export\ KAFKA_JAAS_CONFIG="org.apache.kafka.common.security.plain.PlainLoginModule  required username='\'123434\'' password='\'123\+abcdee\'';"' ${RUNDRUID}
  • s

    Sachin G

    09/25/2025, 4:32 AM
    when i use this hard coded credentials in the user-init script, i am able to use KAFKA_JAAS_CONFIG as a variable in my Kafka Ingestion job
  • s

    Sachin G

    09/25/2025, 4:33 AM
    but instead of hard coded credentials i want to use env variables, something like below ( i tried various scripts but no luck so far, this is just 1 example)
  • s

    Sachin G

    09/25/2025, 4:33 AM
    Copy code
    #!/bin/bash
    
    RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist | head -n 1)
    if [ -z "$RUNDRUID" ]; then
      echo "run-druid script not found."
      exit 1
    fi
    
    
    sed -i "/^exec.*/i export KAFKA_JAAS_CONFIG=\"org.apache.kafka.common.security.plain.PlainLoginModule required username='${USERNAME}' password='${PASSWORD}';\"" "$RUNDRUID"
  • s

    Sachin G

    09/25/2025, 4:33 AM
    Note: Druid is running on Kubernetes (EKS)
  • s

    Sachin G

    09/25/2025, 4:33 AM
    sample snippet of Kafka job with this variable is as below
  • s

    Sachin G

    09/25/2025, 4:35 AM
    image.png
  • s

    Sachin G

    09/25/2025, 4:36 AM
    i have defined these variables (Username and Password) in Imply Mgr Pod, and also Druid Pods .. restarted the cluster
    i
    • 2
    • 29
  • s

    Sachin G

    09/25/2025, 4:36 AM
    but i get empty value
  • s

    Sachin G

    09/25/2025, 4:36 AM
    Copy code
    sudo -u grove cat /proc/71938/environ | tr '\0' '\n' | grep KAFKA_JAAS_CONFIG
    KAFKA_JAAS_CONFIG=org.apache.kafka.common.security.plain.PlainLoginModule required username='' password='';
  • d

    Danny Wilkins

    09/25/2025, 5:27 PM
    Hey y'all, potentially silly question. I've been trying to play with the task autoscaler but when I look at the supervisor payload in the druid console. Am I supposed to be seeing it in there? I'm also not seeing it scale tasks based on lag, but if I can just verify that the config exists in the console that'll be an easy first step to know I might've fixed it.
    k
    • 2
    • 5
  • d

    Danny Wilkins

    09/25/2025, 5:27 PM
    Rather than artificially inducing lag.
  • t

    Taoufiq Bahalla

    09/26/2025, 11:40 AM
    Hello all, I’m new to Druid and have a question about Kafka ingestion. We’re trying to set up Kafka ingestion in Druid so that all fields from both the Kafka key and value are included in the datasource. Right now, only the first field of the key is being ingested. I found this note in the documentation:
    “The input format to parse the Kafka key only processes the first entry of the
    inputFormat
    field. If your key values are simple strings, you can use the
    tsv
    format to parse them. Note that for
    tsv
    ,
    csv
    , and
    regex
    formats, you need to provide a
    columns
    array to make a valid input format. Only the first one is used, and its name will be ignored in favor of
    keyColumnName
    .”
    Did I miss something? Is there a way to ingest *all Kafka key fields as columns in the Druid datasource*—without copying them into the value? Thanks in advance!
  • r

    Richard Vernon

    09/30/2025, 1:25 PM
    Hello guys, having had our Druid cluster up and running without downtime for years. It's long overdue an upgrade and reconfiguration in terms of data storage/query efficiency however. As it stands there are just over 2.3B rows of event data, and I must say it's been handling the interactive analytics demands exceptionally well. But I would like to improve the storage/querying efficiency, by segmenting using secondary partitioning on our tenant ID/account_number column. We have data flowing in via a Kinesis Data Stream, and I remember in older versions of Druid (0.16), the approach would be to run an index_parallel task regularly with secondary partitioning. With Druid 34.0.0 however, I believe this can be done using auto-compaction? I tried setting up hashed partitioning on the account_number column, however it seems to be skipping every segment for some reason Just for debugging I set the skipOffsetFromLatest to 10M, and even with new segments being written out every 1000 rows, it still seems to be skipping them. The logs don't seem to indicate why exactly the segments aren't compactible: ./coordinator-overlord.log65272025-09-30T130114,507 WARN [Coordinator-Exec-IndexingServiceDuties-0] org.apache.druid.server.compaction.DataSourceCompactibleSegmentIterator - Skipping compaction for datasource[r3-event-stream] as it has no compactible segments. curl -s http://localhost:8081/druid/coordinator/v1/compaction/status?dataSource=r3-event-stream | jq . { "latestStatus": [ { "dataSource": "r3-event-stream", "scheduleStatus": "RUNNING", "message": null, "bytesAwaitingCompaction": 0, "bytesCompacted": 0, "bytesSkipped": 5587645, "segmentCountAwaitingCompaction": 0, "segmentCountCompacted": 0, "segmentCountSkipped": 46, "intervalCountAwaitingCompaction": 0, "intervalCountCompacted": 0, "intervalCountSkipped": 1 } ] } Hoping it's something simple I'm missing, thanks!
    k
    • 2
    • 13
  • j

    JRob

    10/01/2025, 1:39 PM
    Besides hilo query laning is there any other way to prevent really expensive queries from impacting the cluster?
    k
    j
    m
    • 4
    • 17
  • s

    Satya Kuppam

    10/02/2025, 1:39 PM
    Hello Folks, I am having trouble optimising Dart queries on the latest
    34.0.0
    version: • I have a query with a single JOIN and I keep running into
    Not enough memory
    issues (see 🧵 for the query, datasource and task run detail). • The query fails in the
    sortMergeJoin
    phase. We have two historical pods with 64vCPU and 512gigs of memory with -Xmx=107g. • From the Dart documentation its not clear how I can capacity plan for this query or if its possible to run this query successfully at all. ◦ does Dart spill to disk in the join phase? Would that potentially be the problem here?
    • 1
    • 2
  • j

    Jvalant Patel

    10/03/2025, 12:58 AM
    Hi, is there an easy way to add
    org.apache.druid.server.metrics.QueryCountStatsMonitor
    monitor for Peon ingestion processes running on Middle manager nodes ?
    s
    a
    • 3
    • 8
  • s

    Soman Ullah

    10/03/2025, 5:23 PM
    Is there a way to overwrite the last day of data using MSQ semantics? I tried this:
    REPLACE INTO "test-ds" OVERWRITE WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
    but it gave the following error:
    Copy code
    Invalid OVERWRITE WHERE clause [`__time` >= CURRENT_TIMESTAMP - INTERVAL '1' DAY]: Cannot get a timestamp from sql expression [CURRENT_TIMESTAMP - INTERVAL '1' DAY]
    a
    • 2
    • 4
  • d

    Danny Wilkins

    10/06/2025, 3:26 PM
    Hey y'all, I'm tearing my head out over this issue. I'm running on druid 28 and I tried enabling autoscaling, however after enable autoscaling I have an ingestion topic which refuses to scale beyond 1 task. I've removed the autoscaling config, tried to manually set the task count, restarted the instances, replaced the instances, knocked out all of the instances and brought them back up, nothing's letting this get past 1 task. Should I be looking in Zookeeper or something?
    j
    j
    • 3
    • 14
  • j

    JRob

    10/07/2025, 7:55 PM
    Has anyone gotten Protobuf to work with Schema Registry? I'm struggling to get past the following error:
    Copy code
    Cannot construct instance of `org.apache.druid.data.input.protobuf.SchemaRegistryBasedProtobufBytesDecoder`, problem: io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
    The instructions here seem to be wrong: https://druid.apache.org/docs/latest/development/extensions-core/protobuf/#when-using-schema-registry (there is no extension-core folder and creating it didn't fix the issue) I also tried placing the jars to
    extensions/protobuf-extensions
    and
    extensions/druid-protobuf-extensions
    but still no luck...
    b
    • 2
    • 21
  • s

    Sanjay Dowerah

    10/08/2025, 8:06 AM
    Hello Druid Community, Apologies for repeating this, just wanted to keep the loop alive, I am running Druid on an Openshift cluster, and using the Druid Delta Lake extension(https://github.com/apache/druid/tree/master/extensions-contrib/druid-deltalake-extensions) to connect and load Delta tables. However, I am running into the following issue, • error while loading with delta connector: only 1024 records of each constituent parquet file(each partition of the delta table) is loaded Also, there is an an error on the UI as soon as the load is over • ERROR: Request failed with status code 404 For your reference, here is the query I am using to load, REPLACE INTO "table" OVERWRITE ALL WITH "ext" AS ( SELECT * FROM TABLE( EXTERN( '{"type":"delta","tablePath":"path"}', '{"type":"parquet"}' ) ) EXTEND ("col1" VARCHAR, "col2" VARCHAR, "col3" VARCHAR, "col4" BIGINT, "col5" VARCHAR, "col6" VARCHAR, "col7" BIGINT, "col8" VARCHAR, "col9" VARCHAR) ) SELECT MILLIS_TO_TIMESTAMP("dop" * 1000) AS "__time", "col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8" FROM "ext" PARTITIONED BY DAY
  • m

    Maytas Monsereenusorn

    10/11/2025, 8:42 PM
    Is there a way in MSQE to clustered by all dimension (without listing all the dimensions) similar to how in ingestionSpec we can leave partitionDimensions in Hash-based partitioning to null?
    k
    • 2
    • 4
  • a

    Adithya Shetty

    10/13/2025, 6:30 PM
    Hi, we are trying to setup MSQ queries. We are using druid version 29.
  • a

    Adithya Shetty

    10/13/2025, 6:37 PM
    When running MSQ queries we are not seeing any issues in various stages, including
    exportResults
    , but notice that files exported to s3 have truncated rows. When same MSQ queries are configured to export results locally, we are able to see all rows in all files. Below are druid properties configured for MSQ setup. Please check and provide pointers to fix the issue. Thanks in advance
    Copy code
    druid.extensions.loadList:  '["/opt/druid/extensions/maha-druid-lookups","/opt/druid/extensions/druid-datasketches", "/opt/druid/extensions/druid-avro-extensions","/opt/druid/extensions/druid-s3-extensions", "/opt/druid/extensions/mysql-metadata-storage", "/opt/druid/extensions/druid-kafka-indexing-service","/opt/druid/extensions/druid-orc-extensions","/opt/druid/extensions/druid-multi-stage-query","/opt/druid/extensions/statsd-emitter"]'
      "druid.msq.intermediate.storage.enable": "true",
      "druid.msq.intermediate.storage.type": "s3",
      "druid.msq.intermediate.storage.bucket": "{{aws_account_id}}-demand-reporting-druid-prod-{{ aws_region}}",
      "druid.msq.intermediate.storage.prefix": "reports",
      druid.msq.intermediate.storage.tempDir: '/data/msq'
      druid.export.storage.s3.tempLocalDir: '/tmp/msq'
      druid.export.storage.s3.allowedExportPaths: '["<s3://demand-reporting-asyncreports-druid-prod/export/>"]'
      druid.export.storage.s3.chunkSize: 500MiB
    j
    • 2
    • 2
  • r

    Richard Vernon

    10/14/2025, 4:17 PM
    Hello guys, would anyone know how best to perform a whole data source/segment migration from a 0.16.0 cluster (local storage) > 34.0.0 (s3 storage)? I thought of directly loading the segment + metadata from old > new, however the metadata formatting and compression encoding has changed significantly since (or at least as far as I'm aware) Thanks!
    d
    • 2
    • 4