https://linen.dev logo
Join Slack
Powered by
# general
  • e

    Eyal Yurman

    07/11/2025, 8:57 PM
    Hello, anyone else noticed druid.struct.ai hasn't been available for the past few days?
    ➕ 1
    m
    • 2
    • 1
  • k

    kn3jox

    07/15/2025, 9:31 AM
    hi all. how do you specify for a data source that it should monitor a directory for new files and ingest them? thanks!
    j
    • 2
    • 2
  • k

    kn3jox

    07/16/2025, 4:51 AM
    i added this bit to the data source's spec file, but it doesn't seem to work.
    Copy code
    "spec": {
        "ioConfig": {
            "watcher": {
              "type": "file",
              "pollPeriod": "PT10M"
            }
    b
    • 2
    • 1
  • a

    Ashi Bhardwaj

    07/16/2025, 9:11 AM
    Hi folks, please review this PR to upgrade pac4j extension: https://github.com/apache/druid/pull/18259 This major upgrade is needed to fix CVE-2023-52428 which requires upgrading nimbus-jose-jwt to
    9.37.2
    which is not compatible with pac4j v4.
    • 1
    • 4
  • t

    Tim Frey

    07/16/2025, 3:34 PM
    Druid combined with an AI. Natural language queries are then possible with Druid.

    https://www.youtube.com/watch?v=BqCEWRZbRjU&t=345s▾

    😎 2
  • d

    Doaa Deeb

    07/17/2025, 11:06 PM
    Are there any plans around supporting histogram ingestion in Druid? Specifically, I’m referring to a way to ingest and store histograms as a native metric type, similar to Prometheus histogram. The idea is to push pre-aggregated histogram data into Druid, not compute them at query time. Thanks
    a
    b
    • 3
    • 8
  • s

    sezur work

    07/21/2025, 5:13 PM
    Hello everyone
  • s

    sezur work

    07/21/2025, 5:14 PM
    We are hiring Druid Developer from Asia or Africa. It is fully remote position.
  • s

    sezur work

    07/21/2025, 5:14 PM
    Roles & Responsibilities At least 3 years of experience on Druid. · Experience in design and architecture of druid for large scale streaming data process systems. · Experience in setup and configuration of druid for production environments. · Experience in working with druid data modelling · Experience in performance tuning of druid configuration, query optimization · Experience in working with Kafka, Pulsar and druid for real time streaming data ingestion · Hands-on experience in monitor and troubleshooting of the issues Experience in working with cloud platforms and Kubernetes
  • s

    sezur work

    07/21/2025, 5:15 PM
    Please respond me if anyone interested
  • e

    Eyal Yurman

    07/24/2025, 10:13 PM
    Hello, this is about ingestion json parsing. We have a large Kafka topic which is split to multiple data sources on the Druid side, using filter transformation. This means there's compute overhead of parsing the same events across all supervisors (of course, in addition to network overhead). Before I go and benchmark it, does anyone know how likely is it to be a large overhead?
    s
    • 2
    • 1
  • j

    Julien Blondeau

    08/01/2025, 6:31 AM
    Hello, question about timeseries vs timeboundary query performance. I have a table containing emails and customer IDs. I want to get the oldest time for an email and the oldest time for the same email but filtered on a customer ID. My app will have to do 2 timeboundary minTime queries (in parallel) to get those results, they should be fast because they're using the metadata. Whereas I can get the 2 results in only one timeseries query, but it should be slower ? Am I correct ? Which strategy is the fastest ?
    b
    • 2
    • 2
  • a

    Aryan Mullick

    08/01/2025, 9:31 AM
    hello, can someone tell me how to create users and roles on druid for specific permissions
    k
    b
    +2
    • 5
    • 12
  • h

    Hemanth Rao

    08/04/2025, 9:31 AM
    hello, can someone help me out to increase the query timeout from 1minute to xyz minutes tried below parameters druid_server_http_maxQueryTimeout=120000 druid_server_http_defaultQueryTimeout=120000 also tried adding connectionTimeout also still same issue
    g
    k
    • 3
    • 4
  • i

    INNOCENT BOY

    08/19/2025, 9:49 AM
    Hi all, Could someone help to review this fix PR: https://github.com/apache/druid/pull/18415 The handling of null objects in QueryCache is problematic.
  • u

    Utkarsh Chaturvedi

    08/19/2025, 10:17 AM
    Hi team. Wanted to understand how compactions and ingestions in compacted datasources are expected to be working. 1. I have a datasource with data from 2024. Granularity set to Day. I set up its auto compaction with granularity set to Month. Offset day set to 10. Even so on Aug 19 : The segments seem to be month granularity up till July and then Day granularity till Aug 19. Is this expected behaviour? Are these Day level segments going to be there till Aug end? 2. Now I run an ingestion for a time period July 25 - Aug 5. Firstly this ingestion breaks as OVERWRITE WHERE clause identified interval which is not aligned with
    PARTITIONED BY granularity.
    This I figure is because the date range is split between month level segments and day level segments. So I break the ingestion into 2 : Before the month level change and after the month level change. So I run an ingestion July 25 - July 31. This works but only with DAY granularity. So this makes me uncertain about whether or not the ingestion earlier was breaking because of the underlying segment granularity. 3. Now the ingestion for July 25 - July 31. creates 7 day level segments : But they are not getting compacted : Compaction is saying 100% compacted except for last 10 days. Not seeing these uncompacted segments. Shouldn't these segements be relevant for compaction? If anybody who understands compaction well, can help with this. Would be appreciated.
  • l

    Luke Foskey

    08/21/2025, 3:52 AM
    Hi team, I was looking to see. Is there any specific difference in processing times between doing a compaction job, where we change a data source from hour granularity to day granularity, this is for a hashed partitioned data set, in contrast to using an index job re-ingesting it from a druid dataSource as input to a new dataSource. For reference we are looking to test reducing from an hourly granularity to daily granularity as currently we are storing 560 segments per day for 26gb of data and the high count of segments appears to be significantly impacting the ability for caching to succeed and subsequently making queries slower. From testing re-ingesting a days data from a druid dataSource its taking much too long to do, i'm wondering if its different at all from doing it in a compaction job. We don't want to impact the current data in the table by running a compaction job so I'm limited in my capability to test this currently.
    v
    j
    • 3
    • 4
  • s

    sarthak

    08/21/2025, 6:23 AM
    Hi Team, I am using Druid v25 and trying to connect with SSL enabled postgres for metadata storage. I have ca-cert.crt file and configured below configs to common.runtime.properties file
    Copy code
    # Metadata storage configuration
    druid.extensions.loadList=["postgresql-metadata-storage"]
    druid.metadata.storage.type=postgresql
    druid.metadata.storage.connector.connectURI=jdbc:<postgresql://your-db-host:5432/your-db-name>
    druid.metadata.storage.connector.user=your-db-user
    druid.metadata.storage.connector.password=your-db-password
     
    # SSL-specific configuration
    druid.metadata.postgres.ssl.useSSL=true
    druid.metadata.postgres.ssl.sslMode=verify-full
    druid.metadata.postgres.ssl.sslRootCert=/path/to/ca-cert.crt
    It's not working. Errors are in screenshots. If I am trying to connect with ssl Disabled postgres then Druid is running (commented all ssl related configs)
    b
    • 2
    • 2
  • l

    Lionel Mena

    08/21/2025, 2:44 PM
    Hello everybody, I'm looking for a bit of help with Druid SQL query stats. Is there any way to get the amount of data scanned by a SQL query? I couldn't find anything in the documentation.
    b
    • 2
    • 1
  • a

    Alex Niremov

    08/22/2025, 7:08 AM
    Hello guys, we are thinking of building a real-time BI system that could be used by any of our employees. Currently we have a Power BI system that is linked directly to our production database, so any complex queries significantly increase the resource usage on the server, and we would like to separate the BI side from ops. We are thinking of using Apache Druid for that, but the only problem is that the db schema is very decoupled because it is auto-generated by our ERP and often changes, so it would be hard to use Debezium with CDC for our db (SQL Server) as source. Maybe someone faced such a problem and can recommend something? The architecture we currently have in mind is: SQL Server -> Airbyte -> dbt -> Apache Druid -> Metabase
    j
    • 2
    • 3
  • s

    Siva praneeth Alli

    08/24/2025, 12:57 AM
    Hello, i am learning about flattenSpec and it looks like it supports most of the JSONPath features such as applying filters, and maps etc, but i found that JSON_QUERY functions support only limited set of JSONPath features. Is that intentional(may be to not effect query performance) or is there a plan to support that in future?
  • t

    Tanay Maheshwari

    08/24/2025, 6:38 AM
    Can anyone help me understand difference between query/time metrics in broker/historical and sqlQuery/time log. https://druid.apache.org/docs/latest/operations/request-logging#:~:text=The%20following%20shows%20an%20example%20log%20emitter%20output%3A
    b
    g
    • 3
    • 3
  • s

    Sean Fulton

    08/25/2025, 8:12 PM
    I am trying to get this working with mysql but am getting stuck with: org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Unknown system variable 'query_cache_size'). We are using mysql 8.0.42, druid 34, and mysqlconnector 9.4
    b
    • 2
    • 1
  • s

    Sean Fulton

    08/25/2025, 8:12 PM
    I can't find anything documented on how to set the mysql version
  • s

    Shivam Choudhary

    08/27/2025, 1:00 AM
    Hi, I have a bunch of files (10000+, 50 MB each) that my system writes into S3 every minute. What would be the recommendation to have near real time ingestion for this ? I can't use Kafka because my files are big. Should I go with batch ingestion every 5 minutes or so ?
    j
    b
    g
    • 4
    • 11
  • c

    Cristi Aldulea

    08/28/2025, 4:39 AM
    Hi everyone, I’m currently working on enabling basic authentication for a Druid instance running within a Kubernetes cluster. Secrets are being generated and managed via HashiCorp Vault. The initial user creation process is functioning correctly, but I’m encountering challenges with secret rotation. Specifically, I’m looking for a way to rotate the admin user credentials other than using the API, as I no longer have access to the previous secret once it’s regenerated. Would anyone be able to share insights or suggestions on how to approach this? Any help would be greatly appreciated. Thank you in advance! Best regards
  • s

    Suraj Goel

    09/03/2025, 3:34 PM
    Hi Team, Is there a feature in Druid Supervisor to allow pre-ingestion kafka header-based filtering ? Druid's
    transformSpec
    filtering operates after row deserialization and data ingestion and the above feature can help in saving a lot of computation.
  • y

    Yotam Bagam

    09/08/2025, 8:25 AM
    Is there going to be a 2025 Druid summit???
  • v

    Vivek M

    09/08/2025, 10:12 AM
    Hi Team, Issue: Data Length Mismatch During S3 Ingestion in Apache Druid
    Overview
    We are facing an issue while ingesting a large dataset from S3 into Apache Druid. The ingestion process fails during the segment building phase with a data length mismatch error.
    Error Message
    Copy code
    java.lang.IllegalStateException: java.io.IOException: com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=9404416; expectedLength=1020242891; includeSkipped=true; in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0
    at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:108)
    at org.apache.druid.data.input.TextReader$1.hasNext(TextReader.java:73)
    at org.apache.druid.data.input.IntermediateRowParsingReader$1.hasNext(IntermediateRowParsingReader.java:60)
    at org.apache.druid.java.util.common.parsers.CloseableIterator$2.findNextIteratorIfNecessary(CloseableIterator.java:74)
    at org.apache.druid.java.util.common.parsers.CloseableIterator$2.next(CloseableIterator.java:108)
    at org.apache.druid.java.util.common.parsers.CloseableIterator$1.next(CloseableIterator.java:52)
    at org.apache.druid.indexing.common.task.FilteringCloseableInputRowIterator.hasNext(FilteringCloseableInputRowIterator.java:68)
    at org.apache.druid.data.input.HandlingInputRowIterator.hasNext(HandlingInputRowIterator.java:63)
    at org.apache.druid.indexing.common.task.InputSourceProcessor.process(InputSourceProcessor.java:95)
    at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:891)
    at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:500)
    Context
    • The error occurs while ingesting a large JSON file from S3. • Data read has a length of 9,404,416 bytes, while the expected length is 1,020,242,891 bytes. • The error happens in the BUILD_SEGMENTS phase. • The same large dataset is ingested successfully from our local Druid setup without any issues. • Other datasets of smaller sizes are being ingested successfully.
    Questions / Request for Support
    We are looking for guidance and support on the following points: 1. Is this a known issue when ingesting large files from S3 into Druid? 2. Are there recommended configurations or best practices to handle such issues? 3. Should we consider splitting files, adjusting timeouts, or configuring retries to better handle large file ingestion? 4. Are there troubleshooting steps, patches, or workarounds that can help resolve this problem?
    Additional Information
    • Druid version, ingestion spec, and sample files can be provided upon request. • We are happy to share more logs and configuration details as needed. Thank you for your support!
  • s

    Suraj Goel

    09/13/2025, 2:07 PM
    Hi Team, Please review this PR to add the functionality of kafka header based filtering. Thanks