https://linen.dev logo
Join Slack
Powered by
# general
  • s

    Sean Fulton

    08/25/2025, 8:12 PM
    I can't find anything documented on how to set the mysql version
  • s

    Shivam Choudhary

    08/27/2025, 1:00 AM
    Hi, I have a bunch of files (10000+, 50 MB each) that my system writes into S3 every minute. What would be the recommendation to have near real time ingestion for this ? I can't use Kafka because my files are big. Should I go with batch ingestion every 5 minutes or so ?
    j
    b
    g
    • 4
    • 11
  • c

    Cristi Aldulea

    08/28/2025, 4:39 AM
    Hi everyone, I’m currently working on enabling basic authentication for a Druid instance running within a Kubernetes cluster. Secrets are being generated and managed via HashiCorp Vault. The initial user creation process is functioning correctly, but I’m encountering challenges with secret rotation. Specifically, I’m looking for a way to rotate the admin user credentials other than using the API, as I no longer have access to the previous secret once it’s regenerated. Would anyone be able to share insights or suggestions on how to approach this? Any help would be greatly appreciated. Thank you in advance! Best regards
  • s

    Suraj Goel

    09/03/2025, 3:34 PM
    Hi Team, Is there a feature in Druid Supervisor to allow pre-ingestion kafka header-based filtering ? Druid's
    transformSpec
    filtering operates after row deserialization and data ingestion and the above feature can help in saving a lot of computation.
  • y

    Yotam Bagam

    09/08/2025, 8:25 AM
    Is there going to be a 2025 Druid summit???
  • v

    Vivek M

    09/08/2025, 10:12 AM
    Hi Team, Issue: Data Length Mismatch During S3 Ingestion in Apache Druid
    Overview
    We are facing an issue while ingesting a large dataset from S3 into Apache Druid. The ingestion process fails during the segment building phase with a data length mismatch error.
    Error Message
    Copy code
    java.lang.IllegalStateException: java.io.IOException: com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=9404416; expectedLength=1020242891; includeSkipped=true; in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0
    at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:108)
    at org.apache.druid.data.input.TextReader$1.hasNext(TextReader.java:73)
    at org.apache.druid.data.input.IntermediateRowParsingReader$1.hasNext(IntermediateRowParsingReader.java:60)
    at org.apache.druid.java.util.common.parsers.CloseableIterator$2.findNextIteratorIfNecessary(CloseableIterator.java:74)
    at org.apache.druid.java.util.common.parsers.CloseableIterator$2.next(CloseableIterator.java:108)
    at org.apache.druid.java.util.common.parsers.CloseableIterator$1.next(CloseableIterator.java:52)
    at org.apache.druid.indexing.common.task.FilteringCloseableInputRowIterator.hasNext(FilteringCloseableInputRowIterator.java:68)
    at org.apache.druid.data.input.HandlingInputRowIterator.hasNext(HandlingInputRowIterator.java:63)
    at org.apache.druid.indexing.common.task.InputSourceProcessor.process(InputSourceProcessor.java:95)
    at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:891)
    at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:500)
    Context
    • The error occurs while ingesting a large JSON file from S3. • Data read has a length of 9,404,416 bytes, while the expected length is 1,020,242,891 bytes. • The error happens in the BUILD_SEGMENTS phase. • The same large dataset is ingested successfully from our local Druid setup without any issues. • Other datasets of smaller sizes are being ingested successfully.
    Questions / Request for Support
    We are looking for guidance and support on the following points: 1. Is this a known issue when ingesting large files from S3 into Druid? 2. Are there recommended configurations or best practices to handle such issues? 3. Should we consider splitting files, adjusting timeouts, or configuring retries to better handle large file ingestion? 4. Are there troubleshooting steps, patches, or workarounds that can help resolve this problem?
    Additional Information
    • Druid version, ingestion spec, and sample files can be provided upon request. • We are happy to share more logs and configuration details as needed. Thank you for your support!
    k
    • 2
    • 4
  • s

    Suraj Goel

    09/13/2025, 2:07 PM
    Hi Team, Please review this PR to add the functionality of kafka header based filtering. Thanks
    k
    • 2
    • 16
  • a

    Aryan Mullick

    09/19/2025, 7:26 AM
    can someone help me. how do i prevent the default gateway timeout at 5 minutes in my queries?
    b
    • 2
    • 1
  • a

    Adheip Singh

    10/02/2025, 9:02 PM
    Hi, I was looking into broadcast rules feature in druid ? Its been there for quite some time, has anyone used in production ? ( i am aware documentation stating its not prod ready, just want to hear anyone's experience ).
    b
    p
    • 3
    • 3
  • k

    Krishna Singh

    10/04/2025, 6:43 PM
    hi everyone good evening i am using this helm chart to install druid https://asdf2014.github.io/druid-helm/ I am facing the issue of cors when trying to submit query using druid api. How I can disable the cors error ?
  • s

    Sanjay Dowerah

    10/08/2025, 10:21 AM
    Hi All, I see that the Druid jdbc ingestion is not available any more, at least on the docs. Is there a reason for it? https://druid.apache.org/docs/latest/ingestion/jdbc.html
    b
    • 2
    • 1
  • c

    Calum Miller

    10/14/2025, 10:53 AM
    Hi All,
  • c

    Calum Miller

    10/14/2025, 10:56 AM
    Hi All, We extended the pydruid connector to support Druid’s MSQ (Multi-Stage Query) engine, which means Superset dashboards and charts can now use the MSQ engine. In addition, Superset can finally cancel running Druid queries, preventing wasted resources and speeding up the user experience. These enhancements make Superset more responsive for analysts and more efficient for operators. We’ve written a blog on the changes here https://millersoft.co/blog Please reach out if you want to try the new driver? Calum Miller
  • p

    PANKAJ KUMAR

    10/15/2025, 5:39 AM
    Hi, Can someone please review this PR: https://github.com/apache/druid/pull/18634. This is to add a new task distribution strategy based on supervisor affinity. Thanks
    • 1
    • 1
  • l

    lnault

    10/15/2025, 1:26 PM
    Hi everyone ! I’m wondering if there are any plans to support native UUID types in Druid. Currently, we have to store UUIDs as strings, which makes querying them inefficient, and we have to rely on workarounds like lookups or mapping them to Long values. I couldn’t find any existing feature request about this on GitHub. Am I the only one encountering this issue, or is this a common limitation?
    a
    • 2
    • 3
  • t

    taka hayase

    10/28/2025, 12:49 PM
    I am looking for new position as full stack developer now.
  • j

    Julien Blondeau

    10/29/2025, 1:34 PM
    Hi, How do you run unit tests with Druid? I'm currently using a docker compose file with test containers, spawning the entire stack for the whole test suite. It's working fine, but it's slow to start and very slow to clean data between each test...
  • u

    Utkarsh Chaturvedi

    10/31/2025, 4:26 AM
    Hi everyone, I'm trying to understand the exact behavior when
    tieredReplicants
    is set higher than the number of historicals in a tier. Setup example: • Tier has 3 historicals • Datasource configured with
    tieredReplicants: 5
    Question: What actually happens in this case? 1. Does Druid cap the replicas at 3 (one per historical)? 2. Can a single historical load multiple copies of the same segment to satisfy the replication factor? 3. Does it fail/warn/queue the additional replicas? I couldn't find explicit documentation about this edge case. The architecture seems designed to distribute segments across different historicals, but I want to confirm the actual behavior when requested replicas exceed available nodes. Has anyone tested this scenario or can point me to the relevant code/docs that clarifies this? Thanks!
    j
    • 2
    • 1
  • u

    Utkarsh Chaturvedi

    11/18/2025, 7:20 AM
    Hi team. Was wondering if there is any reason why editing load rules does not allow for deletion of entries for datasources? Our use case has a high throughput of datasources being added and deleted with specific load rules so the load rules file tends to get updated. Since there is no delete option for load rules : this causes the file to get bloated with "deleted-datasource":[] type of entries. Lmk if I've missed something here.
  • e

    Etisha Jain

    11/18/2025, 7:33 AM
    Hello Everyone Has anyone work on reading the datafrom Kafka protobuf from druid. and getting multiple error Can someone get on a call to help me to fix it ?? its bit urgent
  • r

    Renato CRON

    11/19/2025, 3:33 PM
    Hi team, I'm still using druid (~ 5 years), I'm running Druid 27 and trying to separate the coordinator from the overlord (previously running in combined mode with
    druid.coordinator.asOverlord.enabled=true
    ). Problem: After applying the new configuration with separate coordinator and overlord StatefulSets, the coordinator keeps crashing with OOM errors. What I've done: 1. Updated the Druid CR to have separate
    coordinators
    and
    overlords
    sections 2. Deleted the existing coordinator StatefulSet (required due to immutable field changes like port) 3. Manually created the missing task tables (
    druid_tasks
    ,
    druid_tasklogs
    ,
    druid_tasklocks
    ) since they didn't exist - my metadata DB only had the 7 base tables (druid_audit, druid_config, druid_datasource, druid_pendingsegments, druid_rules, druid_segments, druid_supervisors) 4. Schema I used: https://gist.github.com/renatocron/8056649b67cc53b02a44a6f98fd30d5b - generated via claude from server/src/main/java/org/apache/druid/metadata/SQLMetadataStorageActionHandler.java - this file do not exists anymore in 35 Current state: • Tables are created, no more "relation does not exist" errors But coordinator is now OOM crashing, I reverted everthing and dropped tables now, I set druid.metadata.storage.connector.createTables=true on coordinator but even then it would not ceate table, was this a bug in the older version?
    l
    • 2
    • 2
  • l

    Lee Schumacher

    11/20/2025, 12:14 AM
    Anyone here actively working on the druid-operator ( https://github.com/datainfrahq/druid-operator) ?
    a
    r
    +2
    • 5
    • 11
  • r

    Razin Bouzar

    11/20/2025, 6:52 PM
    Can an admin create a druid-operator slack channel in this workspace?
  • u

    吴花露

    11/22/2025, 8:38 AM
    Could someone help take a look at this PR? Our company is very eager to have it merged into master as soon as possible. https://github.com/apache/druid/pull/18750
  • d

    D S

    11/25/2025, 1:09 AM
    Hi group! I wanted to check whether Druid can query Iceberg tables directly without ingesting the data first. Are there any plans to support this capability in the future? Thank you.
    b
    r
    • 3
    • 2
  • u

    吴花露

    11/25/2025, 2:02 AM
    Could someone help take a look at this PR? Our company is very eager to have it merged into master as soon as possible. https://github.com/apache/druid/pull/18750
  • a

    Akaash B

    12/02/2025, 2:58 PM
    I am trying to secure the communication between Druid and ZooKeeper using SASL + digest ACLs. However, I want to confirm the correct and recommended way to do this because Druid creates a large number of znodes dynamically under /druid (discovery, announcements, indexer tasks, segment metadata, etc.), and the structure changes frequently during normal cluster operation. Whats the best way forward
  • a

    Ashish Kumar

    12/03/2025, 9:52 AM
    Hi team, I have a question regarding MSQE ingestion in Druid. While checking the segment metadata for one of our experiment tables (ingested via MSQE), I noticed that all columns are being treated as dimensions, including numeric fields that logically should be metrics. From Druid’s design, dimensions get dictionary/bitmap indexes, whereas metrics are stored as simple numeric columns, so theoretically, this could add extra overhead. We haven’t tested the performance impact yet, so this is just based on observation from the segment meta and Druid architecture. So just wanted to confirm: Does ingesting everything as dimensions (via MSQE) have any known performance impact? Has anyone faced similar issues, especially on large datasources?
  • y

    Yotam Bagam

    12/03/2025, 11:43 AM
    Anyone ever thought of using S3 as Druid deep storage and use
    standard-IA
    s3 storage class on the s3 files when uploaded to s3?
  • d

    Danny Wilkins

    12/03/2025, 3:40 PM
    Hey y'all, does druid publish any metrics specifically for ingestion autoscaling? I'd like to make an alert based around (for example) if autoscaling has increased the number of tasks more than 5 times over the last 24 hours. I could just go based on the number of tasks but that'd cause redundant alerts if it was manual action scaling them.
    k
    • 2
    • 2