Apache Druid #general

JRob

05/01/2025, 6:46 PM

Does Druid support read locks? My scenario is I have a supervisor that ingests from kafka (hourly tasks). As soon as data is published, I want to ingest that druid datasource into a new druid datasource. Today, I have to estimate when the realtime ingestion task will be done (i.e. have written everything to historical) and leave enough of a time gap to ensure that everything is written. This is error prone. But if we could somehow tell an ingest task to "not start unless your source datasource has all its data" then our 2nd ingest could be significantly improved.

Nick M

05/07/2025, 12:01 PM

What’s the latest thinking around the coordinator scaling and the number of segments per time chunk? A while back, there was some guidance that around 2000 segments per timechunk was the ideal number and we’d sized our ingest granularity accordingly. But I know the coordinator is so much more performant in newer releases and wondered if that guidance still held.

Daniel Rubio Bonilla

05/09/2025, 10:59 AM

Hello community! We are looking forward improving the monitoring of our Druid cluster. We have already enabled the prometheus emmiter in all the Druid services and the metrics are correctly scrapped. We were wondering if anybody knows of a nice Grafana dashboard and/or prometheus alerts. We did not find anything useful using google. We would prefer not to start from scratch and we are positive to collaborating and sharing our improvements on previous work. Thanks!

👀 1

mehrdadbn9

05/11/2025, 10:35 AM

Hi Is there anyway that we say how much of data in historical for specific data can exist and other should be on deepstorage

👍 1

JRob

05/12/2025, 5:07 PM

Are there any good tools that can analyze slow queries on a Druid cluster? We have enabled query logging on slow queries but there are gaps: 1. We need to log into Druid host to get the query logs 2. It's not easy to see which queries are the actual culprit since one bad query can slow everything down 3. There is no alerting capability

Yotam Bagam

05/13/2025, 6:56 AM

Hey community, Wanted to know if anyone is facing issues we face in our recent Druid efforts. We are using Druid For years (version 30 at the moment) and currently working on adding MSQ ingestion tasks. We have a very high load and we encounter that when there are too many tasks for the middle manager to handle, it creates only

query_controller

workers and no capacity is left for

query_workers

which means tasks arent being handled at all. The obvious solution is scaling the middle manager but we wanted to know if there is a way of giving priority or splitting the workloads. Did someone encounter this when using MSQ? Any suggestions or ideas?

JRob

05/13/2025, 11:20 PM

I recently notice a behaviour in our Kafka ingestion tasks where when Kafka retention kicks in and the data Druid is trying to read is not available, Druid gets stuck in a loop like:

Copy code

2025-05-13T23:09:59,308 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.AbstractFetch - [Consumer clientId=consumer-kafka-supervisor-bmcfdhbl-1, groupId=kafka-supervisor-bmcfdhbl] Fetch position FetchPosition{offset=1189300635, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka-broker:9092 (id: 1691 rack: null)], epoch=absent}} is out of range for partition datasource-124, raising error to the application since no reset policy is configured
2025-05-13T23:09:59,308 WARN [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - OffsetOutOfRangeException with message [Fetch position FetchPosition{offset=1189300635, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka-broker:9092 (id: 1691 rack: null)], epoch=absent}} is out of range for partition datasource-124]
2025-05-13T23:09:59,309 WARN [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Retrying in 30000ms

What I'm wondering is why isn't the Supervisor reporting the error? The job just gets stuck in an infinite loop...

Jvalant Patel

05/14/2025, 10:45 PM

We are moving from legacy way of handling

null

to the latest Druid version where legacy mode is not supported, just wanted to get some help from here to know what should be the best strategy to upgrade druid if we have

null

and

""

strings in the datasources and our queries rely on the legacy behavior. If we want to rewrite queries to handle three valued logic for

null

comparisons, what should be the strategy ? is there any generalized way to modify the queries ? we are still using native Druid query language.

khajjiar trip

05/15/2025, 6:53 AM

hi, is there some sort of a rate limiting that can be applied on supervisor for log ingestion via kafka? The purpose would be to consume at a rate so as to not overwhelm the consumer and also control the resources memory/cpu usage.

Sivakumar Karthikesan

05/16/2025, 8:31 AM

Hi Team, does anyone tried or come across migrating druid deployment based to druid operator without data loss?

Shubham Pratik

05/19/2025, 1:07 PM

Hi everyone 👋, Has anyone successfully connected Apache Druid with AWS MSK (Kafka) using IAM authentication? Thanks in advance!

Rohen

05/19/2025, 1:13 PM

Hi, we're using Kafka with with Druid. Segments are getting missed after some period of time automatically. It makes missing of data. What might be the root cause ?

송재명

05/20/2025, 3:06 AM

Hi team~ May I ask if there’s any timeline for when the Projection feature will become stable? I saw the feature mentioned here, but it’s not yet documented in detail in the official docs. I found some guidance in this issue, and I’m thinking of trying it out based on that. Would that be okay?

Suraj Goel

05/20/2025, 7:12 AM

Hi Team, What is the recommended way to implement count distinct use-case in native queries. We need the exact count. Is Group-By Query recommended ?

Krishna

05/22/2025, 2:59 AM

Hi, did any one integrate druid metrics with OpenTelemetry ?

linu das

05/22/2025, 7:14 AM

I already have a __time column which is not the ingestion time but a different date attribute, want to add Ingestion time as a column . How can i do that with Kafka ingestion

Utkarsh Chaturvedi

05/23/2025, 10:14 AM

Hi folks. Our team is routinely facing 504s when tasks for ingestion. Our cluster is set up on k8s using helm. What we're observing is that the task is actually getting registered on Druid, but the response is getting delayed beyond the nginx/cloudflare required timeout. So when we re trigger the ingestion; it fails due to overlaping segments locked. Any way to resolve the major issue of task not responding with registered task ID in time? We can increase timeouts but would prefer tackling the main problem.

Brindha Ramasamy

05/23/2025, 6:34 PM

Hi, We are not configuring connection pool detail explicitly in common.runtime.properties ( Druid 30.0 ) . What is the default values and when can I find that config.

Lev Zelkind

05/26/2025, 4:15 PM

Heya, we created a new Druid(v32) and would like to insert old data that is present on a older Druid(v0.22) cluster. Both druids use the same data source name and both use s3 for segment storage. But the data is stored in 2 different buckets. I tried syncing the buckets but couldn’t update the metadata sql table to find the old segments, can anybody share insights on how to achieve the data merge?

Giom

05/28/2025, 1:06 AM

Hi, We'd like to change our admin user's password, which didn't change for years (and since it may have leaked...) It seems pretty straightforward to dot it using Druid API. But we have a question about that : Since the admin user may be used for other API calls, it would be better to not fail at updating its password. But what is something goes wrong (I don't know how, maybe like password mistyped) ? We could have the admin user locked out of the cluster. Is there any way to recover it in any way ? Maybe manually edit the database in some ways and reinit one node so it populates the password with initial value provided in config ? What are the best practices regarding admin password rotation (besides keeping it safe of course) ?

William Montgomery

05/29/2025, 6:50 PM

I have a question re: metrics. Does

query/failed/count

include timeout queries that are also included in

query/timeout/count

Eyal Yurman

05/30/2025, 4:42 PM

Hi, I was looking at the state of ingestion in Apache Druid. https://druid.apache.org/docs/latest/ingestion/ I noticed there seems to be quite a lot of options, which is normal for open-source, but was wondering if we are thinking of some convergence. Here's some duplicated/parallel options in different tracks: • Ingestion methods: 4-5 ◦ Streaming (Kafka and Kinesis) ◦ Native batch ◦ SQL (MSQ) ◦ Hadoop-based (deprecated) • task orchestrators/managers: - 2-3 (at least) ◦ middleManager ◦ Kubernetes-based (MM-less). ◦ To some extent, YARN (In case of Hadoop). Although this have been deprecated. • Schedulers: 2 (ok, actually 1 + Null) ◦ Supervisors for streaming. ◦ No scheduler for batch (user relies on external scheduler). I was wondering if there is a roadmap or any thoughts where it goes long-term and perhaps converges: • Should native batch be replaced with MSQ? • Should streaming use SQL (INSERT) as well, instead of native spec? • Should a batch scheduler be introduces and should it share API similarity with streaming schedule (Supervisor).

Nick Marsh

06/02/2025, 4:20 AM

Hello all. I’m trying to figure out how to migrate a supervisor from one Kafka topic to another. I tried updating the supervisor to read from both topics, using a flag on the Kafka payload to signal to the indexer about which topic to use, but when the indexer tries to do the handoff I got the “Inconsistent metadata state” state. The only way I’ve seen to fix this is to reset the supervisor, but that will lead to me processing the same Kafka payloads multiple times. Is there a way to change a supervisor’s topic without resetting it? Alternatively is there any other way to migrate to another topic without the risk of double-processing the Kafka payloads?

Asit

06/03/2025, 4:05 AM

Hi All ,

Asit

06/03/2025, 4:06 AM

We are looking for a Druid Consultant who can help us in scaling and managing workloads . If anyone is interested please DM me

Sachit Swaroop NB

06/03/2025, 5:40 AM

Hi, while setting up DRUID on EKS using helm, we want to use authentication. using druid-basic-security extension for this case, As per the documentation, the following is the given one but it is not accepted by the pods. druid.auth.authenticatorChain: '["MyBasicMetadataAuthenticator"]' druid.auth.authenticator.MyBasicMetadataAuthenticator.type: "basic" druid.auth.authorizers: '["MyBasicMetadataAuthorizer"]' druid.auth.authorizer.MyBasicMetadataAuthorizer.type: "basic" druid.escalator.type: "basic" druid.escalator.internalClientUsername: "druid_system" druid.escalator.internalClientPassword: "your_internal_password" druid.escalator.authorizerName: "MyBasicMetadataAuthorizer" any is there any specific format we need to maintain ? Ref - https://github.com/asdf2014/druid-helm

06/03/2025, 5:54 AM

Hi Team, What is the impact of having Swap memory enabled on the historical servers? In our org, completely disabling Swap is not allowed so we have set the "swappiness=1" which is the smallest possible value. But we see that the Swap memory is 100% utilized even though there is free memory on the host and this is raising alerts at h/w level (as 100% Swap utilization usually indicates that the host may be running out of memory). Does Druid use the Swap memory to read the memory mapped files in addition to the free main memory? Or could this be happening because the OS is swapping in all the files that Druid is reading during query execution? Is the swap actually getting thrashed? Trying to figure if we need to take any action or if this can be safely ignored. Thanks, AR.

Daniel Augusto

06/03/2025, 10:04 AM

We are having trouble with using pod identity in EKS by setting

AWS_CONTAINER_CREDENTIALS_FULL_URI

so Druid can use S3. Minimum AWS SDK is 1.12.746 and Druid uses 1.12.638 in latest.

Copy code

AWS_CONTAINER_CREDENTIALS_FULL_URI has an invalid host. Host should resolve to a loopback address or have the full URI be HTTPS

I've search slack for similar errors, but I could find none. Has anyone seen this too? Do we plan to bump AWS SDK for next release?

✅ 1

sandy k

06/04/2025, 1:02 AM

we are running cluster data, master, broker nodes, running into frequent crashes for broker, overlord. off late zookeeper seems to be unresponsive due connection. ui doesnt show up the segments and just get hung. overlord crashes 3-4 times a day. how to improve on this

Kevin C.S

06/04/2025, 3:11 PM

Hi team, we recently migrated JSON data to avro in apache druid (v31) and we have integrated schema registry to the supervisors. Since we wanted schema evolution we added useSchemaDiscovery to true. After which we noticed Druid started to edit dimensionExclusions and started to add values there on its own. Is this behaviour documented somewhere? As this caused a lot of missing data to get ingested