https://linen.dev logo
Join Slack
Powered by
# general
  • s

    Suraj Goel

    05/20/2025, 7:12 AM
    Hi Team, What is the recommended way to implement count distinct use-case in native queries. We need the exact count. Is Group-By Query recommended ?
    j
    a
    +3
    • 6
    • 17
  • k

    Krishna

    05/22/2025, 2:59 AM
    Hi, did any one integrate druid metrics with OpenTelemetry ?
    g
    • 2
    • 5
  • l

    linu das

    05/22/2025, 7:14 AM
    I already have a __time column which is not the ingestion time but a different date attribute, want to add Ingestion time as a column . How can i do that with Kafka ingestion
    g
    • 2
    • 2
  • u

    Utkarsh Chaturvedi

    05/23/2025, 10:14 AM
    Hi folks. Our team is routinely facing 504s when tasks for ingestion. Our cluster is set up on k8s using helm. What we're observing is that the task is actually getting registered on Druid, but the response is getting delayed beyond the nginx/cloudflare required timeout. So when we re trigger the ingestion; it fails due to overlaping segments locked. Any way to resolve the major issue of task not responding with registered task ID in time? We can increase timeouts but would prefer tackling the main problem.
  • b

    Brindha Ramasamy

    05/23/2025, 6:34 PM
    Hi, We are not configuring connection pool detail explicitly in common.runtime.properties ( Druid 30.0 ) . What is the default values and when can I find that config.
    b
    j
    • 3
    • 3
  • l

    Lev Zelkind

    05/26/2025, 4:15 PM
    Heya, we created a new Druid(v32) and would like to insert old data that is present on a older Druid(v0.22) cluster. Both druids use the same data source name and both use s3 for segment storage. But the data is stored in 2 different buckets. I tried syncing the buckets but couldn’t update the metadata sql table to find the old segments, can anybody share insights on how to achieve the data merge?
    j
    • 2
    • 3
  • g

    Giom

    05/28/2025, 1:06 AM
    Hi, We'd like to change our admin user's password, which didn't change for years (and since it may have leaked...) It seems pretty straightforward to dot it using Druid API. But we have a question about that : Since the admin user may be used for other API calls, it would be better to not fail at updating its password. But what is something goes wrong (I don't know how, maybe like password mistyped) ? We could have the admin user locked out of the cluster. Is there any way to recover it in any way ? Maybe manually edit the database in some ways and reinit one node so it populates the password with initial value provided in config ? What are the best practices regarding admin password rotation (besides keeping it safe of course) ?
    k
    • 2
    • 3
  • w

    William Montgomery

    05/29/2025, 6:50 PM
    I have a question re: metrics. Does
    query/failed/count
    include timeout queries that are also included in
    query/timeout/count
    ?
    k
    • 2
    • 2
  • e

    Eyal Yurman

    05/30/2025, 4:42 PM
    Hi, I was looking at the state of ingestion in Apache Druid. https://druid.apache.org/docs/latest/ingestion/ I noticed there seems to be quite a lot of options, which is normal for open-source, but was wondering if we are thinking of some convergence. Here's some duplicated/parallel options in different tracks: • Ingestion methods: 4-5 ◦ Streaming (Kafka and Kinesis) ◦ Native batch ◦ SQL (MSQ) ◦ Hadoop-based (deprecated) • task orchestrators/managers: - 2-3 (at least) ◦ middleManager ◦ Kubernetes-based (MM-less). ◦ To some extent, YARN (In case of Hadoop). Although this have been deprecated. • Schedulers: 2 (ok, actually 1 + Null) ◦ Supervisors for streaming. ◦ No scheduler for batch (user relies on external scheduler). I was wondering if there is a roadmap or any thoughts where it goes long-term and perhaps converges: • Should native batch be replaced with MSQ? • Should streaming use SQL (INSERT) as well, instead of native spec? • Should a batch scheduler be introduces and should it share API similarity with streaming schedule (Supervisor).
    b
    c
    • 3
    • 5
  • n

    Nick Marsh

    06/02/2025, 4:20 AM
    Hello all. I’m trying to figure out how to migrate a supervisor from one Kafka topic to another. I tried updating the supervisor to read from both topics, using a flag on the Kafka payload to signal to the indexer about which topic to use, but when the indexer tries to do the handoff I got the “Inconsistent metadata state” state. The only way I’ve seen to fix this is to reset the supervisor, but that will lead to me processing the same Kafka payloads multiple times. Is there a way to change a supervisor’s topic without resetting it? Alternatively is there any other way to migrate to another topic without the risk of double-processing the Kafka payloads?
    j
    • 2
    • 5
  • a

    Asit

    06/03/2025, 4:05 AM
    Hi All ,
  • a

    Asit

    06/03/2025, 4:06 AM
    We are looking for a Druid Consultant who can help us in scaling and managing workloads . If anyone is interested please DM me
  • s

    Sachit Swaroop NB

    06/03/2025, 5:40 AM
    Hi, while setting up DRUID on EKS using helm, we want to use authentication. using druid-basic-security extension for this case, As per the documentation, the following is the given one but it is not accepted by the pods. druid.auth.authenticatorChain: '["MyBasicMetadataAuthenticator"]' druid.auth.authenticator.MyBasicMetadataAuthenticator.type: "basic" druid.auth.authorizers: '["MyBasicMetadataAuthorizer"]' druid.auth.authorizer.MyBasicMetadataAuthorizer.type: "basic" druid.escalator.type: "basic" druid.escalator.internalClientUsername: "druid_system" druid.escalator.internalClientPassword: "your_internal_password" druid.escalator.authorizerName: "MyBasicMetadataAuthorizer" any is there any specific format we need to maintain ? Ref - https://github.com/asdf2014/druid-helm
    g
    • 2
    • 1
  • a

    AR

    06/03/2025, 5:54 AM
    Hi Team, What is the impact of having Swap memory enabled on the historical servers? In our org, completely disabling Swap is not allowed so we have set the "swappiness=1" which is the smallest possible value. But we see that the Swap memory is 100% utilized even though there is free memory on the host and this is raising alerts at h/w level (as 100% Swap utilization usually indicates that the host may be running out of memory). Does Druid use the Swap memory to read the memory mapped files in addition to the free main memory? Or could this be happening because the OS is swapping in all the files that Druid is reading during query execution? Is the swap actually getting thrashed? Trying to figure if we need to take any action or if this can be safely ignored. Thanks, AR.
    g
    j
    • 3
    • 7
  • d

    Daniel Augusto

    06/03/2025, 10:04 AM
    We are having trouble with using pod identity in EKS by setting
    AWS_CONTAINER_CREDENTIALS_FULL_URI
    so Druid can use S3. Minimum AWS SDK is 1.12.746 and Druid uses 1.12.638 in latest.
    Copy code
    AWS_CONTAINER_CREDENTIALS_FULL_URI has an invalid host. Host should resolve to a loopback address or have the full URI be HTTPS
    I've search slack for similar errors, but I could find none. Has anyone seen this too? Do we plan to bump AWS SDK for next release?
    ✅ 1
    g
    • 2
    • 2
  • s

    sandy k

    06/04/2025, 1:02 AM
    we are running cluster data, master, broker nodes, running into frequent crashes for broker, overlord. off late zookeeper seems to be unresponsive due connection. ui doesnt show up the segments and just get hung. overlord crashes 3-4 times a day. how to improve on this
    g
    • 2
    • 7
  • k

    Kevin C.S

    06/04/2025, 3:11 PM
    Hi team, we recently migrated JSON data to avro in apache druid (v31) and we have integrated schema registry to the supervisors. Since we wanted schema evolution we added useSchemaDiscovery to true. After which we noticed Druid started to edit dimensionExclusions and started to add values there on its own. Is this behaviour documented somewhere? As this caused a lot of missing data to get ingested
  • k

    kn3jox

    06/06/2025, 10:34 AM
    hello. i have time series data in Druid and am using Superset to chart that data. some charts look at the entire time series, but for some charts i only want to look at the last day's data. i can't configure the time range to be the "Last Day" or any of the other time range filters because it's possible i don't have data from "yesterday" in relation to "today". can someone help me with a custom SQL that would get the last day's (in the data, not in time) data for my chart?
    a
    • 2
    • 8
  • c

    Cristina Munteanu

    06/10/2025, 1:18 PM
    Hey everyone! 👋 Join us for a Real-Time Analytics & AI at Scale meetup in New York City on June 18! It’s a casual, in-person gathering for devs working on big data, distributed systems, AI infra, or just curious about how modern stacks scale real-time analytics. 📍New York City 🗓️ Wednesday June 18th 🍕 Talks + food + networking https://www.pingcap.com/event/real-time-analytics-and-ai-at-scale-meetup/ No fluff — just solid tech talks, cool people, and hands-on lessons from the field. Hope to see you there! 🙂
  • j

    Juan Pablo Egido

    06/10/2025, 8:54 PM
    Hi everyone, does it make sense to create an MCP Server for Druid? Has anyone done it?
    a
    k
    • 3
    • 2
  • a

    Aryan Mullick

    06/17/2025, 6:54 AM
    hey everyone, ive been using druid to write queries on a large datasource. but recently when i’ve tried to run it on a datasource with a size of 5.2GB its started showing bad gateway. i should mention im using a single server small set up. if anyone can help me out i would really appreciate it
    g
    • 2
    • 1
  • a

    Asit

    06/17/2025, 7:10 PM
    Hi everyone, we have a usecase where we suspend and resume supervisors continuously to manage resources used for kafka ingestion . Lately we have seen that the overload uses a lot of memory (90GB plus ) so quick question does suspended supervisors add to overload memory ?
    k
    • 2
    • 1
  • a

    anish

    06/18/2025, 6:09 AM
    Hi everyone, getting this issue with msq
    Copy code
    RowTooLarge: Encountered row that cannot fit in a single frame (max frame size = 1,000,000)
    How does one increase the maxFrameSize, tried setting this in contextParam but of no use
    k
    • 2
    • 1
  • m

    Mirza Munawar

    06/19/2025, 5:17 AM
    Exception while seeking to the [earliest] offset of partitions in topic [schema_history_remote_postgres]: Timeout expired while fetching topic metadata while i'm trying to fetch data from kafka
  • m

    Mirza Munawar

    06/19/2025, 5:26 AM
    idk much about druid i just started learning it
  • m

    Mirza Munawar

    06/19/2025, 5:26 AM
    can anyone say me what i'm doing wrong here ?
  • m

    Mirza Munawar

    06/19/2025, 6:52 AM
    ohh it worked with Docker container name instead of localhost
    👍 1
  • e

    Eyal Yurman

    06/20/2025, 11:01 PM
    Im looking to assign different run priority to different ingestion tasks. (Meaning, which task gets to run, not related to task locking priority). We're ingesting data in a Lambda fashion: new data is ingested in streaming and then re-ingested after a week with batch. Since the batch re-ingestion has a much more relaxed SLA, we were thinking to let both workload share the same hardware resources, but protect the streaming by giving it higher run priority. But I couldn't find anything about such configuration - does this mechanism exist? I wonder if MM-less can support it, as Kubernetes has these mechanisms built-in.
    j
    b
    • 3
    • 3
  • a

    Abdul Ahad Munaf

    06/24/2025, 5:22 PM
    Hi guys, Hope you're doing well, Is there any way to ingest data from 3 kafka servers into 1 datasource, docs says that we can only have 1 kafka per datasource and submitting another spec file would just replace the previous one, but is there any other way? Any help would be really appreciated (beginner on druid), Thank you
    j
    • 2
    • 2
  • a

    Abdul Ahad Munaf

    06/24/2025, 5:24 PM
    Any other alternatives would help aswell, where we can ingest data from mutiple kafka servers to 1 data source 🙂