Apache Druid #troubleshooting

Salvador Pardiñas

01/29/2025, 4:53 PM

Hi all! I have a quick question but I'm not sure if this is the right place. We have a query that looks something like this (tables and parameters renamed for confidentiality):

Copy code

with info_1 as (select distinct spm."CODE" as "CODE", spm."ID" as ID
                       from d_object_info spm
                       where spm.SIGNUP_DATE is not null and spm.STATUS = 'ACTIVE'),
     info_2 as (select *
                           from "d_object_metric"
                           where "CATEGORY" = 'SOME CATEGORY'
                             and TIME_EXTRACT(__time, 'year') = 2024
                             and TIME_EXTRACT(__time, 'month') = 06)
select spm.CODE,
       (select coalesce(sum(METRIC), 0)
        from info_2 vma
        where vma.CODE = spm.CODE) as total_metric
from info_1 spm

Which was working correctly on Druid 28, we've updated to Druid 31 and all of a sudden we get the following error:

Copy code

Calcite assertion violated: [Cannot add expression of different type to set: set type is RecordType(VARCHAR COD, VARCHAR ID, DOUBLE EXPR$0) NOT NULL expression type is RecordType(VARCHAR CODE, VARCHAR ID, DOUBLE NOT NULL EXPR$0) NOT NULL

I've looked through the release notes and can't find any breaking changes that might be affecting this query. Any ideas?

Salvador Pardiñas

01/29/2025, 6:08 PM

Is there any way (in druid SQL) to strip the NOT NULL from an expression type? The output of coalesce is a not null type and apparently that's breaking a couple of queries for us

info Advisionary

01/30/2025, 11:09 AM

I am working with Apache Druid 31.0.1 and I need to apply a spatial filter using a polygon shape. Specifically, I want to filter data based on whether a point falls within a given polygon. Can anyone provide an example of how to set up and use spatial filters with polygons in Druid? I’ve read through the documentation and tried using various filter options, but I’m having trouble with the correct syntax for defining the polygon in a spatial filter. I would appreciate any examples or pointers on how to structure this in Druid 31.0.1. there is no example of using polygon in spatial filters in druid's official documentation. any help will be appreciated

Lionel Mena

01/30/2025, 2:39 PM

Hello!! Ever since I upgraded from Druid v29.0.1 to v30.0.1 I have Kafka realtime one or two realtime tasks failing on a daily basis but I'm not able to pinpoint the root cause. Wondering if someone can give me more insight into what might be happening. I haven changed anything accross Druid versions so this might be a Druid The task exception message:

Copy code

2025-01-30T14:10:21,505 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.
java.lang.InterruptedException: null
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1640) ~[?:?]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.possiblyPause(SeekableStreamIndexTaskRunner.java:1356) ~[druid-indexing-service-30.0.1.jar:30.0.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:595) [druid-indexing-service-30.0.1.jar:30.0.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:277) [druid-indexing-service-30.0.1.jar:30.0.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:153) [druid-indexing-service-30.0.1.jar:30.0.1]
	at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.1.jar:30.0.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-30.0.1.jar:30.0.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-30.0.1.jar:30.0.1]
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]

Overlord error message:

Copy code

2025-01-30T14:10:34,804 ERROR [TaskQueue-OnComplete-0] org.apache.druid.indexing.overlord.TaskQueue - Ignoring notification for already-complete task: {class=org.apache.druid.indexing.overlord.TaskQueue, task=index_kafka_sdk-americas-realtime_f26b8ca42fa849f_pfaamibg}

You can find attached the: • Task's full logs • Associated Overlord and MiddleManager logs. • Supervisor spec. • Datasource auto compaction specs.

supervisor_spec.json auto_compaction-spec.json realtime_task_logs.txt overlord_logs.txt

Maytas Monsereenusorn

01/31/2025, 1:24 AM

Anyone uses Hadoop range partitioning with maxRowsPerSegment? I am running into this issue and wondering if my understanding is correct? https://github.com/apache/druid/pull/147#issuecomment-2620044467 Thanks!

Miguel Vieira Colombo

02/03/2025, 4:04 PM

Hello Team, Druid v30. We have a huge delay to attach task to overlord, taking more than 30s per task every time we send a request to /druid/indexer/v1/task. This is a cluster with an overlord with 16vCPU and 32GB RAM (ec2 c6.4xlarge) and an RDS postgres with 2vCPU and 2GB RAM (db.t4g.small), we tried to run a manual insert query on druid_tasks, druid_audit and druid_tasklocks and all of them are executing insert under 50ms, so I don't think it's an insert operation in metadata, but I have no idea what the bottleneck is. At no time do we have CPU or memory spikes on the overlord, CPU itself is below 50% at all times and memory is linearly at 48%. These are our JVM and runtime configurations

Copy code

-server
-Xms12g
-Xmx12g
-XX:+ExitOnOutOfMemoryError
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:ParallelGCThreads=5
-XX:ConcGCThreads=2
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=var/druid/derby.log
-Daws.region=us-east-1
-Djute.maxbuffer=15728640

Copy code

druid.service=druid/coordinator
druid.plaintextPort=8081

druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S
druid.manager.segments.pollDuration=PT5S

druid.indexer.logs.kill.enabled=true
druid.indexer.logs.kill.delay=14400000
druid.indexer.logs.kill.durationToRetain=86400000

# Run the overlord service in the coordinator process
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord

druid.indexer.storage.recentlyFinishedThreshold=PT12H
druid.manager.config.pollDuration=PT10M
druid.indexer.runner.maxZnodeBytes=15728640

druid.metadata.storage.connector.createTables=false

druid.indexer.queue.startDelay=PT5S

druid.indexer.storage.type=metadata

And suggestion, idea or help is really appreciated Thanks

Venugopal Vupparaboina

02/05/2025, 8:43 AM

Hello everyone, We are getting the following

java.lang.OutOfMemoryError: GC overhead limit exceeded

exception while running an MSQ with broadcast joins:

Copy code

org.apache.druid.java.util.common.ISE: worker sketch fetch failed
	at org.apache.druid.msq.exec.ControllerImpl$RunQueryUntilDone.checkForErrorsInSketchFetcher(ControllerImpl.java:2747)
	at org.apache.druid.msq.exec.ControllerImpl$RunQueryUntilDone.run(ControllerImpl.java:2730)
	at org.apache.druid.msq.exec.ControllerImpl$RunQueryUntilDone.access$000(ControllerImpl.java:2681)
	at org.apache.druid.msq.exec.ControllerImpl.runTask(ControllerImpl.java:433)
	at org.apache.druid.msq.exec.ControllerImpl.run(ControllerImpl.java:372)
	at org.apache.druid.msq.indexing.MSQControllerTask.runTask(MSQControllerTask.java:258)
	at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179)
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478)
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.druid.msq.indexing.error.MSQException: WorkerRpcFailed: RPC call to task failed unrecoverably: [query-f0eee42a-a191-4b75-af72-a59254c00669-worker0_0]
	at org.apache.druid.msq.exec.ExceptionWrappingWorkerClient$1.onFailure(ExceptionWrappingWorkerClient.java:160)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1119)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:807)
	at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:127)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1286)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1055)
	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:782)
	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:49)
	at org.apache.druid.rpc.ServiceClientImpl$1.handleResultValue(ServiceClientImpl.java:277)
	at org.apache.druid.rpc.ServiceClientImpl$1.onSuccess(ServiceClientImpl.java:190)
	at org.apache.druid.rpc.ServiceClientImpl$1.onSuccess(ServiceClientImpl.java:183)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1133)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	... 3 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at com.fasterxml.jackson.core.util.ByteArrayBuilder.toByteArray(ByteArrayBuilder.java:163)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeBase64(UTF8StreamJsonParser.java:3648)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getBinaryValue(UTF8StreamJsonParser.java:526)
	at com.fasterxml.jackson.databind.deser.std.PrimitiveArrayDeserializers$ByteDeser.deserialize(PrimitiveArrayDeserializers.java:469)
	at com.fasterxml.jackson.databind.deser.std.PrimitiveArrayDeserializers$ByteDeser.deserialize(PrimitiveArrayDeserializers.java:432)
	at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromString(BeanDeserializerBase.java:1488)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:208)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:198)
	at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:542)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:566)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:450)
	at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)

Anyone else faced this outofmemory issue ?

Mahesha Subrahamanya

02/05/2025, 4:54 PM

Hello Team, When "maxNumConcurrentSubTasks" is set to 5 and the s3 file index parallel ingestion starts. Initially, it started with 5 single phase + 1 index parallel however, at the end of ingestion, 12 single phase tasks were spanned to complete this. Is there any property in the ingestion template control to span across multiple single-phase subtasks and also, on what condition it start to spin new tasks? Could anybody help us understand how to manage this setup and what's going on here? Thanks

Animesh Gupta

02/06/2025, 4:26 AM

Hello Team , not able to access druid client over http (default port 8888) on chrome & firefox , what could be possible reason ?

WhatsApp Video 2025-02-05 at 8.37.10 PM.mp4

Suraj Goel

02/10/2025, 2:16 PM

Hi Team, We have recently upgraded to Druid-30 from Druid-25. We are ingesting the data from Kafka and occasionally one of the task is failing which leads to increased consumer lag. The frequency is once in a day. There is no replication. The task fails even after the status is SUCCESS in the logs. Logs (in chronological order):

Copy code

Received pause command, pausing ingestion until resumed.

Updating status of task [index_kafka_task] to [TaskStatus{id=index_kafka_task, status=FAILED, duration=-1, errorMsg=An exception occurred while waiting for task [index_kafka_task...}].

Shutdown [index_kafka_task] because: [An exception occurred while waiting for task [index_kafka_task] to pause: [org.apache.druid.rpc.HttpResponseException: Server error [409 Conflict]; body: Can't pause, task is not in a pausable state (state: [PAUSED])]]

Got shutdown request for task[index_kafka_task]. Asking worker[indexer] to kill it.

Stopping thread for task: index_kafka_task

Shutdown [index_kafka_task] because: [shut down request via HTTP endpoint]

Sent shutdown message to worker: indexer, status 200 OK, response: {"task":"index_kafka_task"}

Task [index_kafka_task] status changed to [SUCCESS].

Setting task[index_kafka_task] work item state from [RUNNING] to [COMPLETE].
Exception:
java.lang.RuntimeException: Stacktrace...
	at org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner$HttpRemoteTaskRunnerWorkItem.setStateUnconditionally(HttpRemoteTaskRunner.java:1913)
	at org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner$HttpRemoteTaskRunnerWorkItem.setState(HttpRemoteTaskRunner.java:1894)
	at org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner$HttpRemoteTaskRunnerWorkItem.setResult(HttpRemoteTaskRunner.java:1881)
	at org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.taskComplete(HttpRemoteTaskRunner.java:506)
	at org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.taskAddedOrUpdated(HttpRemoteTaskRunner.java:1695)
	at org.apache.druid.indexing.overlord.hrtr.WorkerHolder$2.notifyListener(WorkerHolder.java:432)
	at org.apache.druid.indexing.overlord.hrtr.WorkerHolder$2.deltaSync(WorkerHolder.java:425)
	at org.apache.druid.server.coordination.ChangeRequestHttpSyncer$1.onSuccess(ChangeRequestHttpSyncer.java:293)
	at org.apache.druid.server.coordination.ChangeRequestHttpSyncer$1.onSuccess(ChangeRequestHttpSyncer.java:259)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1133)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

• Is this a known issue ? • What can be the problem here ? • Any recent change that is causing this as this issue was not there in Druid-25 ?

Kiarash Norouzi

02/11/2025, 5:33 AM

Hey everyone, I’m running Druid v30.0.1 and started increasing manual compactions plus adding more task slots for auto-compactions about a week ago. Now I’m facing a critical issue: drop-segment tasks are stuck in a pending state, and there’s a large backlog of segments waiting to be dropped. Because our storage is self-hosted, it’s difficult and time-consuming to expand capacity. I’ve tried issuing kill tasks for unused segments, but none of them actually get dropped. Has anyone run into this problem or have any advice? Thanks!

Dinesh

02/12/2025, 7:01 PM

Hello everyone how are heap is different here I mena druid_XMX and -xmx in javaopts array . Arey they different or same , this is coordinator configuration I am sharing - name: DRUID_XMS value: 6G - name: DRUID_XMX value: 12G - name: JAVA_OPTS value: -server -Xmx12g -Xms3g -Xlog:gc* -XX:+UseG1GC -Duser.timezone=UTC -Dfile.encoding=UTF-8

Dinesh

02/13/2025, 9:20 AM

I had one more doubt related to compaction, Can I compact or reingest data (through batch ingestion) past retention period data which is being read or write from s3 ? Or segments has to be marked as used for that ?

Mateusz Kalinowski

02/13/2025, 3:58 PM

Hello Guys! Is there a way to set the default value that will be returned for a lookup? Something like

replaceMissingValueWith

? Example configuration:

Copy code

{
  "type": "map",
  "replaceMissingValueWith": "2025-01-01 00:00:00",
  "map": {
    "1": "1000-01-01 00:00:00",
    "2": "1000-01-01 00:00:00"
  }
}

I know there is a way to change the SQL to have the 3rd parameter as described here, but I want to change this behaviour just for the dev environment:

Copy code

The LOOKUP function also accepts a third argument called replaceMissingValueWith as a constant string. If the lookup does not contain a value for the provided key, then the LOOKUP function returns this replaceMissingValueWith value rather than NULL, just like COALESCE. For example, LOOKUP(store, 'store_to_country', 'NA') is equivalent to COALESCE(LOOKUP(store, 'store_to_country'), 'NA').

Jimbo Slice

02/13/2025, 9:30 PM

Hi does anyone here know if the cassandra deep storage component still uses Thrift? It would seem to be the case as I am seeing this when trying to write out to a local cassandra 4 instance: 2025-02-13T205943,731 WARN [task-runner-0-priority-0] org.apache.druid.msq.exec.ControllerImpl - Controller: Work failed; task query-001d88f1-1190-4c72-a0d6-0e492783c8e6; host localhost8100 WorkerFailed: Worker task failed: [query-001d88f1-1190-4c72-a0d6-0e492783c8e6-worker0_0] (org.apache.druid.msq.indexing.error.MSQException: WorkerFailed: Worker task failed: [query-001d88f1-1190-4c72-a0d6-0e492783c8e6-worker0_0]) 2025-02-13T205943,731 WARN [task-runner-0-priority-0] org.apache.druid.msq.exec.ControllerImpl - Worker: Work failed; stage 1; task query-001d88f1-1190-4c72-a0d6-0e492783c8e6-worker0_0; host localhost8101 UnknownError: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9042, latency=2(2), attempts=1] org.apache.thrift.transport.TTransportException: Read a negative frame size (-2063597568)! java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9042, latency=2(2), attempts=1] org.apache.thrift.transport.TTransportException: Read a negative frame size (-2063597568)! Thrift is deprecated and so is astyanax, I'm using the later V30 version of the cassandra component, also had to manually add high-scale-lib jar to get the thing to work (this is officially missing). What is going on with this cassandra connector? There is no information anywhere about it really I've had to scrape around the internet and compare differing things and some from a long time ago... Do i need to use cassandra 3 with thrift support and instead connect to the thrift socket? There are no official guidelines available on the web, I've managed to get this far too, thinking of creating a wiki page or something detailing my journey. don't want to give up now :(.

Julian Reyes

02/14/2025, 4:36 PM

I am trying to submit this to the supervisor

Copy code

"transformSpec": {
    "filter": {
        "type": "not",
        "field": {
            "type": "in",
            "dimension": "user_id",
            "value": [
                "1234"
            ]
        }
    },
    "transforms": [
      {
        "type": "expression",
        "name": "isSubRequest",
        "expression": "if(\"batch_id\" > 0, 1, 0)"
      }
    ]
  }

however I am getting

Copy code

Failed to submit supervisor: Cannot construct instance of `org.apache.druid.query.filter.InDimFilter`, 
  problem: values cannot be null at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 2038] 
  (through reference chain: org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorSpec["spec"]
  ->org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisorIngestionSpec["dataSchema"]
  ->org.apache.druid.segment.indexing.DataSchema["transformSpec"]
  ->org.apache.druid.segment.transform.TransformSpec["filter"]
  ->org.apache.druid.query.filter.NotDimFilter["field"])

Not sure if I am missing something or the filter is bad configured

Dinesh

02/17/2025, 4:40 AM

Can someone please clarify this what exactly it is ? DRUID_XMS/XMX and -Xms and -Xmx in javaopts array in druid components , how they are different , what to keep and what to not?

Carlos M

02/17/2025, 5:51 PM

Is there any replacement for the Druid monitors

org.apache.druid.server.metrics.TaskSlotCountStatsMonitor

and

org.apache.druid.server.metrics.TaskCountStatsMonitor

in Druid 30.x and above? in both cases the middle managers would refuse to start with

Exception in thread "main" java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors

Sarang Vadali

02/17/2025, 11:32 PM

Hi team, I am trying to set up the druid aws rds module to allow druid to connect to rds postgresql via AWS IAM temporary token. I was wondering how to actually pass in the following properties as mentioned in the doc:

Copy code

{ "type": "aws-rds-token", "user": "USER", "host": "HOST", "port": PORT, "region": "AWS_REGION" }

What druid property should these values be passed into? An example for this would be very helpful!

massimo zorer

02/18/2025, 6:26 PM

Hi team, I am trying to use druid (32.0.0) to consume protobuf messages from an EventHub topic. However, the native

druid-protobuf-extensions

fails to flatten the messages if they contain repeated fields. My goal is to flatten the message if it contains repeated fields. So I cloned the git repo of the current extension and modified the parser, the reader, the InputFormat etc. to make it so that given a message that contains repeated fields the Cartesian product over the repeated fields is returned. Also test are fine in my new exstension. In addition, I also want the fields that are not set in the message but are declared in the protobuf descriptor to be returned. I loaded my extension into the compose docker but when I try to use it from the ui I get an error on `org.apache.druid.indexing.overlord.sampler.InputSourceSampler.Sample`:

Size of rawColumnsList([[{...}]]) does not correspond to size of inputRows([[{}]])

Do you kindly have any suggestions?

Rushil

02/19/2025, 1:37 PM

Hi team I wanted to learn about about druid optimizes filtering and group by for low vs high cardinality columns and also for which is it faster low or high.Also any documentation of how to use trino with druid to increase performance

tania manhas

02/20/2025, 1:28 AM

Hi everyone, I have a query regarding metrics emission by Druid. We have enabled

org.apache.druid.server.metrics.QueryCountStatsMonitor

to capture query metrics in Druid. However, the logs indicate that

query/count

remains at

, However, our queries are being processed. I did research online and it seems this was a bug in earlier versions which has been fixed. We are currently using verison 30.0.1. Can anyone please help me with this? Thank you

Carlos M

02/21/2025, 6:56 PM

is anyone from datainfra here (

druid-operator

)? the

<https://charts.datainfra>[.]io

seems to be pointing to some parking site:

Dinesh

02/26/2025, 9:04 AM

Hi Everyone Recently we upgraded to druid 31.0.1. Now previously(old version of druid 28.X) we used to get an error while querying the data , [ERROR]CACHE_NOT_INITIALIZED. However post upgrade this error has become very frequent something like below anyone can please help on this to understand what exactly this issue . org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl@2936ffa3: CACHE_NOT_INITIALIZED, extractorID = namespace-factory-UriExtractionNamespace{uri=null,

Dinesh

02/26/2025, 9:07 AM

One of the example lookup configs looks like this .. { "extractionNamespace": { "fileRegex": "some_file_name.json", "namespaceParseSpec": { "format": "customJson", "keyFieldName": "cid", "valueFieldName": "distName" }, "pollPeriod": "PT30M", "type": "uri", "uriPrefix": "<lookup_location_in_s3>" }, "firstCacheTimeout": 0, "type": "cachedNamespace" }

Hagen Rother

02/28/2025, 4:54 PM

I am hunting an unparseable json event in my kafka stream. Does anybody know which log4j config to set to get the source event? Fishing it out of kafka is close to impossible thanks to its throughput.

Jimbo Slice

03/02/2025, 11:45 AM

Hi, think I've found a bug.... Max tasks set to 8 and to use default min of 2 tasks for each job, I submit 7 jobs to import some files from disk, via curl, what happens is 7 new workers are created, and 7 other workers are spawned after. Because there are only 8 slots - only 1 of the workers is importing the data at a time, whilst the other 6 workers wait for a slot to become available. Other observations: If i submit 4 tasks, they will all run fine and import at the same time (4x2=8). If i submit 8 tasks, the newly spawned workers take up all the 8 slots, and only until 1 of the workers times out, the other will pick up and start importing, but timeouts start being hit on the tasks that have been waiting a while now. This is quite frustrating because it presents a deadlock situation - fair enough, the timeout may occur, but, tbh, this is what I should be seeing: 1. Job is submitted with maxNumTasks=2 2. *T*his number of slots is immediately allocated from druid.worker.capacity (8-2) 3. The remaining slots in my situation should then be 6. 4. Any subsequent jobs submitted follow the same process. 5. If there are no workers, the jobs are queued and await execution, not time out? I don't think I'm missing much here - the allocation of worker slots seems to be flawed and maxNumTasks should be respected earlier in the process to avoid this timeout deadlock?

Dinesh

03/03/2025, 7:03 AM

Hello I am running index parallel tasks where source is s3 , I am giving maxNumConcurrentTasks to 5 but it's launching only 3. How can I launch more subtasks to speed up the ingestion or any other way to achieve the same, we have enough resources

Chetan Patidar

03/03/2025, 8:32 AM

Hey folks, Can I please get review on this PR Thanks cc - @kfaraz @Abhishek Agarwal @Gian Merlino

Keith Byrd

03/03/2025, 2:45 PM

Help! Researching potential for using Druid as our platform for large project. I cannot get druid to ingest large numbers of messages from kafka. If I run an application looping through hundreds of thousands of messages the ingestion task status changes to UNHEALTHY_TASKS and just keeps restarting. If I send just a few thousand messages it seems to work ok but doesn't want o complete the segement. At one point it looked like I had to run several iterations of smaller groups of messages then I could send millions and it would ingest them without issue but now that process seems to no even work. WSL2 on Win11, Druid-31.01, Kafka_2.13-3.9.0