https://linen.dev logo
Join Slack
Powered by
# dev
  • a

    Abhishek Balaji Radhakrishnan

    05/07/2025, 12:31 AM
    A fix for the
    json_merge()
    function when someone gets a chance: https://github.com/apache/druid/pull/17983. Thanks!
    ✅ 2
    g
    • 2
    • 1
  • a

    Abhishek Balaji Radhakrishnan

    05/12/2025, 9:09 PM
    Could someone take a look at this fix https://github.com/apache/druid/pull/17997 for the linked issue? Thanks!
    ✅ 1
    k
    • 2
    • 1
  • a

    Abhishek Balaji Radhakrishnan

    05/23/2025, 1:42 AM
    Could someone take a look at this fix for a segment unavailability related bug: https://github.com/apache/druid/pull/18025
    • 1
    • 1
  • m

    Maytas Monsereenusorn

    05/28/2025, 2:03 AM
    Thinking about Threshold prioritization strategy (https://druid.apache.org/docs/latest/configuration/#threshold-prioritization-strategy) and was wondering if the following changes would make sense or not (haven’t tested any of these and don’t know if they will be useful in practise or not) • What if we can stack the violatesThreshold and penalize a query more if it violates multiple threshold. i.e.
    Copy code
    int toAdjust = 0
        if (violatesPeriodThreshold) {
          toAdjust += adjustment;
        }
        if (violatesDurationThreshold) {
          toAdjust += adjustment;
        }
        if (violatesSegmentThreshold) {
          toAdjust += adjustment;
        }
        if (violatesSegmentRangeThreshold) {
          toAdjust += adjustment;    
        }
        if (toAdjust != 0) {
          final int adjustedPriority = theQuery.context().getPriority() - toAdjust;
          return Optional.of(adjustedPriority);
        }
    • What if we can set the adjustment value for each Threshold seperately? i.e.
    Copy code
    int toAdjust = 0
        if (violatesPeriodThreshold) {
          toAdjust += periodThresholdAdjustment;
        }
        if (violatesDurationThreshold) {
          toAdjust += durationThresholdAdjustment;
        }
        if (violatesSegmentThreshold) {
          toAdjust += segmentThresholdAdjustment;
        }
        if (violatesSegmentRangeThreshold) {
          toAdjust += segmentRangeThresholdAdjustment;    
        }
        if (toAdjust != 0) {
          final int adjustedPriority = theQuery.context().getPriority() - toAdjust;
          return Optional.of(adjustedPriority);
        }
    The motivation for the first change is that if a query that violate N thresholds, it should be penalize more (not equal) to another query that violate N-1 thresholds. The motivation for the second change is that some violate are worst than other. i.e. periodThreshold is not that bad compare to segmentRangeThreshold. The prioritization value would then carry over to the Historical and can help with resources prioritization on Historical processing threadpool (related to this discussion https://apachedruidworkspace.slack.com/archives/C030CMF6B70/p1745436989786489). CC:@Gian Merlino @Clint Wylie
    g
    • 2
    • 8
  • a

    Abhishek Balaji Radhakrishnan

    05/28/2025, 3:02 AM
    Was looking into this feature: https://github.com/apache/druid/pull/13967. I just commented there, but I'm wondering if there are any plans to revive this PR?
    k
    • 2
    • 4
  • s

    Soman Ullah

    05/28/2025, 7:15 PM
    Hello, does lookup loading disabling work for kafka task? I see it works for MSQ task.
    a
    • 2
    • 6
  • j

    Jesse Tuglu

    06/03/2025, 11:11 PM
    Hi, wanted to open a thread on what folks think on storing commit metadata for multiple supervisors in the same datasource (e.g the operation here). Currently, Druid stores commit metadata 1:1 with datasource. This update is done either in a shared tx with segment publishing, or in an isolated commit. From what I can see, implementors of
    DataSourceMetadata
    are solely supervisor-based (either materialized view or seekable stream).
    ObjectMetadata
    seems to only be used in tests. The way I see it there are ≥ 2 options: • Commit a datasource metadata row per supervisor (likely the easiest, but will take some re-workings on the
    SegmentTransactionalInsertAction
    API and others, who assume these rows are keyed by
    datasource
    ) – I'm currently doing this and it seems to work fine. • Commit a single row per datasource, storing partitions per supervisor ID and doing merges in the
    plus
    minus
    methods ◦ Something like the payload being: ▪︎ map[supervisor_id] =
    SeekableStreamSequenceNumbers
    ◦ This might suffer from write contention since N supervisors * M tasks per supervisor will be attempting to write new updates in the commit payload to this row in the DB.
    g
    • 2
    • 11
  • a

    Allen Madsen

    06/10/2025, 9:39 PM
    I could use some help thinking about how to tackle this problem. We have a table that's starting to become too big for a global lookup. We attempted to use a loading lookup, however, the initial queries are too slow. In a query I'm using to test, a fetch for a single record takes ~40ms. The problem is that, there are about 1000 distinct values, which equates to about ~30s of total runtime, because the loading lookup looks up each value independently. When I query all 1000 values together, the total time for the query is ~80ms. At first, I noticed applyAll could be overridden on the LookupExtractor and thought that may be a way to have it batch query lookups. However, I noticed that applyAll is never called. Abstractly, I'd like druid to tell the lookup to prefetch all the values it needs before joining or have the joinCursor iterate in batches and be able to make calls to the database. What are ya'll's thoughts on the best way to approach this problem?
    g
    • 2
    • 4
  • j

    Jesse Tuglu

    06/13/2025, 1:46 AM
    @Clint Wylie 👋 Do you folks still run Druid historical nodes with transparent huge pages disabled? (Tagging you as I see you were the author of this documentation commit)
    c
    b
    • 3
    • 8
  • j

    Jesse Tuglu

    06/17/2025, 7:31 PM
    👋 @Zoltan Haindrich, I noticed in the current build that this line points to a personal repo of yours. It seems like during build, maven actually scans that repo for other modules (not just quidem):
    Copy code
    [INFO] ------------------< org.apache.druid:druid-quidem-ut >------------------
    [INFO] Building druid-quidem-ut 34.0.0-SNAPSHOT                         [80/80]
    [INFO]   from quidem-ut/pom.xml
    [INFO] --------------------------------[ jar ]---------------------------------
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-multi-stage-query/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-datasketches/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-orc-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-parquet-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-avro-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-protobuf-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-s3-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-kinesis-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-azure-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-google-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-hdfs-storage/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-histogram/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-aws-common/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-processing/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-sql/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-indexing-hadoop/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/mysql-metadata-storage/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-kafka-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-basic-security/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-lookups-cached-global/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-testing-tools/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/simple-client-sslcontext/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-services/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-server/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-gcp-common/34.0.0-SNAPSHOT/maven-metadata.xml>
    Wondering if you knew about this, and whether this was intentional cc @Gian Merlino
    g
    z
    • 3
    • 6
  • s

    Soman Ullah

    07/08/2025, 9:52 PM
    Does MSQ replace always create tombstones? Is there a way to get rid of them if they don't have eternity timestamps?
    b
    a
    • 3
    • 8
  • c

    Cristian Daniel Gelvis Bermudez

    07/09/2025, 4:00 PM
    Hello everyone, I'm trying to extract data from deep storage with a query to the /druid/v2/sql/statements/ endpoint. The task runs fine, but at the end, the following error occurs, preventing me from extracting the query response. { "error": "druidException", "errorCode": "notFound", "persona": "USER", "category": "NOT_FOUND", "errorMessage": "Query [query-9578562a-94f0-452d-998a-e66e0f7d0ff5] was not found. The query details are no longer present or might not be of the type [query_controller]. Verify that the id is correct.", "context": {} } Does anyone know why this happens?
    g
    • 2
    • 3
  • j

    Jesse Tuglu

    07/15/2025, 1:53 AM
    Hey folks, question: is
    ListFilteredDimensionSpec
    not allowed to have
    null
    as an element in its
    values
    array? Or is this a bug? Example:
    Copy code
    {
      "dimensions": [
        {
          "type": "listFiltered",
          "delegate": "based_on",
          "values": [
            null,
            "A",
            "B"
          ]
        }
      ],
      "aggregations": [
        {
          "type": "doubleSum",
          "fieldName": "value",
          "name": "value"
        }
      ],
      "intervals": [
        "2025-01-01T00:00:00.000Z/2025-07-10T23:59:59.999Z"
      ],
      "queryType": "groupBy",
      "granularity": "all",
      "dataSource": "datasource_A"
    }
    will fail with
    Copy code
    Error: RUNTIME_FAILURE (OPERATOR)
    
    Cannot invoke "String.getBytes(String)" because "string" is null
    
    java.lang.NullPointerException
    The line in question is this. Passing in empty string in-place of the
    NULL
    returns null values, so this is a partial work-around for now.
    g
    • 2
    • 8
  • j

    Jesse Tuglu

    07/17/2025, 6:20 PM
    @Gian Merlino @Clint Wylie Wanted to get some clarification on whether ingesting empty strings in both batch/streaming ingests causes the value to be inserted as
    NULL
    into the segment when
    druid.generic.useDefaultValueForNull=true
    . Is this the expected behavior? See the data loader photo attached, where it appears to show parsing row 3's
    string_value
    as an empty string. However, post-segment creation, I took a dump of the segment:
    Copy code
    {"__time":1704070860000,"title":"example_1","string_value":"some_value","long_value":1,"double_value":0.1,"float_value":0.2,"multi_value":["a","b","c"],"count":1,"double_value_doubleSum":0.1,"float_value_floatSum":0.2,"long_value_longSum":1}
    {"__time":1704070920000,"title":"example_2","string_value":"another_value","long_value":2,"double_value":0.2,"float_value":0.3,"multi_value":["d","e","f"],"count":1,"double_value_doubleSum":0.2,"float_value_floatSum":0.3,"long_value_longSum":2}
    {"__time":1704070980000,"title":"example_3","string_value":null,"long_value":0,"double_value":0.0,"float_value":0.0,"multi_value":null,"count":1,"double_value_doubleSum":0.0,"float_value_floatSum":0.0,"long_value_longSum":0}
    {"__time":1704071040000,"title":"example_4","string_value":null,"long_value":0,"double_value":0.0,"float_value":0.0,"multi_value":null,"count":1,"double_value_doubleSum":0.0,"float_value_floatSum":0.0,"long_value_longSum":0}
    you can see that in row 3, the
    string_value
    column has replaced what I'd expect to be an
    ""
    with a
    null
    . This is running on version v31, with
    druid.generic.useDefaultValueForNull=true
    . I've tested that running on v33 with
    druid.generic.useDefaultValueForNull=false
    produces the expected result (
    ""
    stored instead of
    null
    ).
    g
    • 2
    • 3
  • c

    Chris Warren

    07/24/2025, 4:23 PM
    When creating extensions that depend on other extensions... is there a way to actually load those dependencies reliably? Specifically, I'm trying to implement the
    RoleProvider
    interface that is defined in the druid-basic-security extension, but if I try to make my extension a standalone extension (using the
    provided
    scope for the
    druid-basic-security
    dependency in my
    pom.xml
    ) I get
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/druid/security/basic/authorization/RoleProvider
    ... however if I shade the jar, then it does work. Also if I build it into
    druid-basic-security
    that also works. Am I missing something with how the interface might be implemented outside of this core extension package? Anyone have experience with this?
    a
    g
    • 3
    • 5
  • a

    Ashi Bhardwaj

    07/31/2025, 2:01 PM
    Hey, one check on my PR is failing due to the use of a potentially broken or risky cryptographic algorithm in
    processing/src/main/java/org/apache/druid/crypto/CryptoService.java
    . I have only made changes to pac4j extension in this PR and didn't change the algorithm and fixing this unrelated check will require significant change and testing in the other module. How can I get my PR merged in such a scenario? CC: @Lucas Capistrant (since you've been reviewing the PR)
    g
    • 2
    • 6
  • s

    Soman Ullah

    08/07/2025, 6:34 PM
    Is it normal for balancing time to take 15s on a v32.0 druid cluster with 30k segments? Smart loading mode is enabled on this cluster.
  • j

    Jesse Tuglu

    08/12/2025, 9:49 PM
    👋 v34 question: any reason why this kill datasource handle in the UI does not call the same prefix in the docs? I'm seeing 404 errors in the UI when deleting unused segments from UI:
    Copy code
    curl '<router-url>/druid/indexer/v1/datasources/<datasource>/intervals/1000-01-01_2025-08-11' -v -X DELETE
    vs this succeeds (url which is documented)
    Copy code
    curl '<router-url>/druid/coordinator/v1/datasources/<datasource>/intervals/1000-01-01_2025-08-11' -v -X DELETE
    commit
    g
    • 2
    • 3
  • a

    Atul Mohan

    08/13/2025, 1:47 AM
    Hello, centralized datasource schema was introduced a few versions ago and I see that it is still marked as experimental. Are there any known issues with this feature that is preventing us from marking it production ready? FWIW I've been running it on a couple of clusters and it has been looking good.
    g
    k
    • 3
    • 2
  • e

    Eyal Yurman

    08/14/2025, 5:56 PM
    anyone tried to renaming a datasource? It sounds easy enough, especially with some downtime (renaming folders in hdfs, updating metadata, deleting zk nodes). but is there anything hard coded in the segment files themselves which would prevent it?
    g
    • 2
    • 2
  • a

    Atul Mohan

    08/14/2025, 10:04 PM
    RowSignature
    currently does an order sensitive equality check here. Do we actually need to preserve column order or is it sufficient to check if both the objects have all the columns regardless of order? I've observed instances where BrokerSegmentMetadataCache frequently updates the row signature, with the only difference being a change in column order. These frequent updates cause signature changes during the query planning phase, In rare cases, the broker generates the projection based on the old signature and apply it to the new signature leading to unexpected query results.
    g
    • 2
    • 7
  • m

    Maytas Monsereenusorn

    08/21/2025, 10:14 PM
    why is the default for
    druid.announcer.skipSegmentAnnouncementOnZk
    false but the default for
    druid.serverview.type
    is http? If we are using http, why announce segments on zk ? is it still needed for something?
    g
    • 2
    • 3
  • s

    Shekhar Rajak

    08/26/2025, 6:06 PM
    Hello team, Are you also eagerly waiting for Kafka 4 feature supports in Druid Kafka ingestion ? https://github.com/apache/druid/issues/18439
    g
    s
    • 3
    • 4
  • a

    Abhishek Balaji Radhakrishnan

    08/28/2025, 3:32 PM
    Fix up compilation failures on master: https://github.com/apache/druid/pull/18449
    ✅ 1
  • s

    Suraj Goel

    09/03/2025, 3:56 PM
    Hi Team, Is there a feature in Druid Supervisor to allow pre-ingestion kafka header-based filtering ? Druid's
    transformSpec
    filtering operates after row deserialization and data ingestion and the above feature can help in saving a lot of computation.
    g
    • 2
    • 2
  • s

    Satya Kuppam

    09/25/2025, 8:34 PM
    Hello Folks, Not sure if I am missing any docs. But I am looking for an example of creating a embedded projection, with MSQ instead of the json ingestion spec. Any pointers?
  • k

    Karan Kumar

    09/26/2025, 3:27 AM
    You would have to use the catalog.
    s
    • 2
    • 6
  • a

    Antoine Boyer

    09/30/2025, 6:10 PM
    Hey team 👋 , I have a feature request for the web console, when sending parameters in the query workbench, I’d like to be able to send an array of values. This is useful for queries using SCALAR_IN_ARRAY for example. I opened a PR before opening an issue since I felt it was a relatively straightforward ask but happy to open an issue first if needed.
  • p

    Pranav

    10/01/2025, 11:24 PM
    question about big decimal: I’m planning to add support for variable-scale
    BigDecimal
    with inline storage optimization. Would you recommend implementing this as a native column type or as a complex column ?
  • j

    Jesse Tuglu

    10/06/2025, 9:03 PM
    Hi folks, wondering what the code cutoff date for v35 release will be cc @kfaraz?
    k
    • 2
    • 1