https://linen.dev logo
Join Slack
Channels
main
ask-struct-ai
data-sketches
dev
docs-and-training
equinix-imply-external
general
kubernetes-druid
npci-imply-external
random
release
troubleshooting
web-console
Powered by
# dev
  • a

    Ashi Bhardwaj

    01/05/2025, 9:38 AM
    Hi folks 👋 Can someone please review this PR? Thanks!
  • m

    Maytas Monsereenusorn

    01/07/2025, 8:38 PM
    Hi! Could I get reviews on this PR when they have the time? https://github.com/apache/druid/pull/17439 Would be great to have it in v32. Thanks!
  • m

    Maytas Monsereenusorn

    01/18/2025, 1:18 AM
    What do people think about emitting a histogram for Druid metric emission? I think currently all the metric values are just numbers
  • h

    Hazmi

    01/20/2025, 10:03 AM
    Hey! Could someone please review this PR when they have the time? https://github.com/apache/druid/pull/17646
    • 1
    • 1
  • s

    Suraj Goel

    01/27/2025, 5:10 PM
    Hi Team, Can someone please review this PR. TIA!
    • 1
    • 2
  • s

    Suraj Goel

    01/28/2025, 7:02 AM
    Hi Team, Please review this PR to improve S3 upload speed.
    k
    • 2
    • 8
  • i

    info Advisionary

    01/30/2025, 11:08 AM
    I am working with Apache Druid 31.0.1 and I need to apply a spatial filter using a polygon shape. Specifically, I want to filter data based on whether a point falls within a given polygon. Can anyone provide an example of how to set up and use spatial filters with polygons in Druid? I’ve read through the documentation and tried using various filter options, but I’m having trouble with the correct syntax for defining the polygon in a spatial filter. I would appreciate any examples or pointers on how to structure this in Druid 31.0.1. there is no example of using polygon in spatial filters in druid's official documentation. any help will be appreciated
    k
    p
    • 3
    • 2
  • m

    Maytas Monsereenusorn

    01/31/2025, 6:05 AM
    Can I get a quick review on https://github.com/apache/druid/pull/17652? Thanks!
  • s

    Suraj Goel

    02/13/2025, 10:13 AM
    Hi Team, Please review this PR for bug fix - https://github.com/apache/druid/issues/17722 Related thread - https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1739196991316389 Thanks
  • a

    Ashwin Tumma

    02/21/2025, 1:10 AM
    Hi, Can someone please help review these two PRs: https://github.com/apache/druid/pull/17744 https://github.com/apache/druid/pull/17745 Thanks!
  • j

    Jamie Chapman-Brown

    02/27/2025, 6:23 PM
    Hi there! I'm the guy who added rabbitmq superstream ingestion to druid. It's been working well for us, thanks for the help getting it integrated! We've been using druid-exporter to get statistics out to Prometheus, and we used to have rabbitmq ingest stats to use in monitoring, like
    druid_emitted_metrics{metric_name="ingest-rabbit-lag"}
    . We've recently switched to using the Prometheus plugin, but I can't find any rabbit ingest stats. Am I missing anything? Can anyone point me to what I would need to change to get these stats back?
    l
    • 2
    • 1
  • m

    Mikhail Sviatahorau

    03/05/2025, 3:51 PM
    Hey all! Looking for input on how to prevent compaction tasks from publishing overlapping segments. Here's what happened: • Auto-compaction was running every 30 minutes, compacting daily-partitioned data. • A manual compaction reindexed the same period with a different granularity. • Once the manual compaction finished, some previously scheduled auto-compaction tasks (created before the reindexing) started running. • These tasks compacted data with the old granularity, causing a mix of granularities and overlapping segments (daily segments were created inside of monthly one). • After the data was successfully compacted, things stabilized. For a fix, we need to prevent autocompaction from publishing overlapping segments. We know which segments will be published only at the InputSourceProcessor level on the worker. The indexing task has a
    coordinator-issued
    prefix, which helps identify auto-compaction and could be used to check and reject publishing segments that don’t match the current cluster state. The problem is that this prefix isn’t accessible on the level where segments are being chosen. The options are to add some preprocessing of the input source at generateAndPublishSegments in IndexTask or to send some flag to the InputSourceProcessor.prosess() method, but both feel like a last resort. Curious to hear your thoughts — if not there, where do you think this could be best handled?
    g
    k
    • 3
    • 5
  • m

    Maytas Monsereenusorn

    03/24/2025, 7:07 PM
    We can’t include dependency on GNU General Public License v2.0 library right?
    g
    • 2
    • 1
  • m

    Maytas Monsereenusorn

    04/16/2025, 9:38 PM
    Would it be possible to have the SegmentMetadata queries return distinct values of a column? Similar to how it already supports cardinality, this would just be looking at the keys in the string column dictionaries?
    g
    • 2
    • 15
  • m

    Maytas Monsereenusorn

    04/23/2025, 7:36 PM
    Do we have any plan on extending query lane to Historical and realtime ingestion task (i.e. Peons)? Having query lane only on the broker makes it less useful. For example, a single very expensive query (which would run on low lane on broker) could/would still takes up all the processing thread on historical and blocks other query from running on those historical. CC: @Clint Wylie (as author of the query lane on the broker). Thanks!!
    g
    c
    +2
    • 5
    • 51
  • a

    Abhishek Balaji Radhakrishnan

    05/07/2025, 12:31 AM
    A fix for the
    json_merge()
    function when someone gets a chance: https://github.com/apache/druid/pull/17983. Thanks!
    ✅ 2
    g
    • 2
    • 1
  • a

    Abhishek Balaji Radhakrishnan

    05/12/2025, 9:09 PM
    Could someone take a look at this fix https://github.com/apache/druid/pull/17997 for the linked issue? Thanks!
    ✅ 1
    k
    • 2
    • 1
  • a

    Abhishek Balaji Radhakrishnan

    05/23/2025, 1:42 AM
    Could someone take a look at this fix for a segment unavailability related bug: https://github.com/apache/druid/pull/18025
    • 1
    • 1
  • m

    Maytas Monsereenusorn

    05/28/2025, 2:03 AM
    Thinking about Threshold prioritization strategy (https://druid.apache.org/docs/latest/configuration/#threshold-prioritization-strategy) and was wondering if the following changes would make sense or not (haven’t tested any of these and don’t know if they will be useful in practise or not) • What if we can stack the violatesThreshold and penalize a query more if it violates multiple threshold. i.e.
    Copy code
    int toAdjust = 0
        if (violatesPeriodThreshold) {
          toAdjust += adjustment;
        }
        if (violatesDurationThreshold) {
          toAdjust += adjustment;
        }
        if (violatesSegmentThreshold) {
          toAdjust += adjustment;
        }
        if (violatesSegmentRangeThreshold) {
          toAdjust += adjustment;    
        }
        if (toAdjust != 0) {
          final int adjustedPriority = theQuery.context().getPriority() - toAdjust;
          return Optional.of(adjustedPriority);
        }
    • What if we can set the adjustment value for each Threshold seperately? i.e.
    Copy code
    int toAdjust = 0
        if (violatesPeriodThreshold) {
          toAdjust += periodThresholdAdjustment;
        }
        if (violatesDurationThreshold) {
          toAdjust += durationThresholdAdjustment;
        }
        if (violatesSegmentThreshold) {
          toAdjust += segmentThresholdAdjustment;
        }
        if (violatesSegmentRangeThreshold) {
          toAdjust += segmentRangeThresholdAdjustment;    
        }
        if (toAdjust != 0) {
          final int adjustedPriority = theQuery.context().getPriority() - toAdjust;
          return Optional.of(adjustedPriority);
        }
    The motivation for the first change is that if a query that violate N thresholds, it should be penalize more (not equal) to another query that violate N-1 thresholds. The motivation for the second change is that some violate are worst than other. i.e. periodThreshold is not that bad compare to segmentRangeThreshold. The prioritization value would then carry over to the Historical and can help with resources prioritization on Historical processing threadpool (related to this discussion https://apachedruidworkspace.slack.com/archives/C030CMF6B70/p1745436989786489). CC:@Gian Merlino @Clint Wylie
    g
    • 2
    • 8
  • a

    Abhishek Balaji Radhakrishnan

    05/28/2025, 3:02 AM
    Was looking into this feature: https://github.com/apache/druid/pull/13967. I just commented there, but I'm wondering if there are any plans to revive this PR?
    k
    • 2
    • 4
  • s

    Soman Ullah

    05/28/2025, 7:15 PM
    Hello, does lookup loading disabling work for kafka task? I see it works for MSQ task.
    a
    • 2
    • 6
  • j

    Jesse Tuglu

    06/03/2025, 11:11 PM
    Hi, wanted to open a thread on what folks think on storing commit metadata for multiple supervisors in the same datasource (e.g the operation here). Currently, Druid stores commit metadata 1:1 with datasource. This update is done either in a shared tx with segment publishing, or in an isolated commit. From what I can see, implementors of
    DataSourceMetadata
    are solely supervisor-based (either materialized view or seekable stream).
    ObjectMetadata
    seems to only be used in tests. The way I see it there are ≥ 2 options: • Commit a datasource metadata row per supervisor (likely the easiest, but will take some re-workings on the
    SegmentTransactionalInsertAction
    API and others, who assume these rows are keyed by
    datasource
    ) – I'm currently doing this and it seems to work fine. • Commit a single row per datasource, storing partitions per supervisor ID and doing merges in the
    plus
    minus
    methods ◦ Something like the payload being: ▪︎ map[supervisor_id] =
    SeekableStreamSequenceNumbers
    ◦ This might suffer from write contention since N supervisors * M tasks per supervisor will be attempting to write new updates in the commit payload to this row in the DB.
    g
    • 2
    • 11
  • a

    Allen Madsen

    06/10/2025, 9:39 PM
    I could use some help thinking about how to tackle this problem. We have a table that's starting to become too big for a global lookup. We attempted to use a loading lookup, however, the initial queries are too slow. In a query I'm using to test, a fetch for a single record takes ~40ms. The problem is that, there are about 1000 distinct values, which equates to about ~30s of total runtime, because the loading lookup looks up each value independently. When I query all 1000 values together, the total time for the query is ~80ms. At first, I noticed applyAll could be overridden on the LookupExtractor and thought that may be a way to have it batch query lookups. However, I noticed that applyAll is never called. Abstractly, I'd like druid to tell the lookup to prefetch all the values it needs before joining or have the joinCursor iterate in batches and be able to make calls to the database. What are ya'll's thoughts on the best way to approach this problem?
    g
    • 2
    • 4
  • j

    Jesse Tuglu

    06/13/2025, 1:46 AM
    @Clint Wylie 👋 Do you folks still run Druid historical nodes with transparent huge pages disabled? (Tagging you as I see you were the author of this documentation commit)
    c
    b
    • 3
    • 8
  • j

    Jesse Tuglu

    06/17/2025, 7:31 PM
    👋 @Zoltan Haindrich, I noticed in the current build that this line points to a personal repo of yours. It seems like during build, maven actually scans that repo for other modules (not just quidem):
    Copy code
    [INFO] ------------------< org.apache.druid:druid-quidem-ut >------------------
    [INFO] Building druid-quidem-ut 34.0.0-SNAPSHOT                         [80/80]
    [INFO]   from quidem-ut/pom.xml
    [INFO] --------------------------------[ jar ]---------------------------------
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-multi-stage-query/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-datasketches/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-orc-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-parquet-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-avro-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-protobuf-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-s3-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-kinesis-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-azure-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-google-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-hdfs-storage/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-histogram/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-aws-common/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-processing/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-sql/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-indexing-hadoop/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/mysql-metadata-storage/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-kafka-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-basic-security/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-lookups-cached-global/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-testing-tools/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/simple-client-sslcontext/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-services/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-server/34.0.0-SNAPSHOT/maven-metadata.xml>
    Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-gcp-common/34.0.0-SNAPSHOT/maven-metadata.xml>
    Wondering if you knew about this, and whether this was intentional cc @Gian Merlino
    g
    z
    • 3
    • 6
  • s

    Soman Ullah

    07/08/2025, 9:52 PM
    Does MSQ replace always create tombstones? Is there a way to get rid of them if they don't have eternity timestamps?
    b
    a
    • 3
    • 8
  • c

    Cristian Daniel Gelvis Bermudez

    07/09/2025, 4:00 PM
    Hello everyone, I'm trying to extract data from deep storage with a query to the /druid/v2/sql/statements/ endpoint. The task runs fine, but at the end, the following error occurs, preventing me from extracting the query response. { "error": "druidException", "errorCode": "notFound", "persona": "USER", "category": "NOT_FOUND", "errorMessage": "Query [query-9578562a-94f0-452d-998a-e66e0f7d0ff5] was not found. The query details are no longer present or might not be of the type [query_controller]. Verify that the id is correct.", "context": {} } Does anyone know why this happens?
    g
    • 2
    • 3
  • j

    Jesse Tuglu

    07/15/2025, 1:53 AM
    Hey folks, question: is
    ListFilteredDimensionSpec
    not allowed to have
    null
    as an element in its
    values
    array? Or is this a bug? Example:
    Copy code
    {
      "dimensions": [
        {
          "type": "listFiltered",
          "delegate": "based_on",
          "values": [
            null,
            "A",
            "B"
          ]
        }
      ],
      "aggregations": [
        {
          "type": "doubleSum",
          "fieldName": "value",
          "name": "value"
        }
      ],
      "intervals": [
        "2025-01-01T00:00:00.000Z/2025-07-10T23:59:59.999Z"
      ],
      "queryType": "groupBy",
      "granularity": "all",
      "dataSource": "datasource_A"
    }
    will fail with
    Copy code
    Error: RUNTIME_FAILURE (OPERATOR)
    
    Cannot invoke "String.getBytes(String)" because "string" is null
    
    java.lang.NullPointerException
    The line in question is this. Passing in empty string in-place of the
    NULL
    returns null values, so this is a partial work-around for now.
    g
    • 2
    • 8
  • j

    Jesse Tuglu

    07/17/2025, 6:20 PM
    @Gian Merlino @Clint Wylie Wanted to get some clarification on whether ingesting empty strings in both batch/streaming ingests causes the value to be inserted as
    NULL
    into the segment when
    druid.generic.useDefaultValueForNull=true
    . Is this the expected behavior? See the data loader photo attached, where it appears to show parsing row 3's
    string_value
    as an empty string. However, post-segment creation, I took a dump of the segment:
    Copy code
    {"__time":1704070860000,"title":"example_1","string_value":"some_value","long_value":1,"double_value":0.1,"float_value":0.2,"multi_value":["a","b","c"],"count":1,"double_value_doubleSum":0.1,"float_value_floatSum":0.2,"long_value_longSum":1}
    {"__time":1704070920000,"title":"example_2","string_value":"another_value","long_value":2,"double_value":0.2,"float_value":0.3,"multi_value":["d","e","f"],"count":1,"double_value_doubleSum":0.2,"float_value_floatSum":0.3,"long_value_longSum":2}
    {"__time":1704070980000,"title":"example_3","string_value":null,"long_value":0,"double_value":0.0,"float_value":0.0,"multi_value":null,"count":1,"double_value_doubleSum":0.0,"float_value_floatSum":0.0,"long_value_longSum":0}
    {"__time":1704071040000,"title":"example_4","string_value":null,"long_value":0,"double_value":0.0,"float_value":0.0,"multi_value":null,"count":1,"double_value_doubleSum":0.0,"float_value_floatSum":0.0,"long_value_longSum":0}
    you can see that in row 3, the
    string_value
    column has replaced what I'd expect to be an
    ""
    with a
    null
    . This is running on version v31, with
    druid.generic.useDefaultValueForNull=true
    . I've tested that running on v33 with
    druid.generic.useDefaultValueForNull=false
    produces the expected result (
    ""
    stored instead of
    null
    ).
    g
    • 2
    • 3
  • c

    Chris Warren

    07/24/2025, 4:23 PM
    When creating extensions that depend on other extensions... is there a way to actually load those dependencies reliably? Specifically, I'm trying to implement the
    RoleProvider
    interface that is defined in the druid-basic-security extension, but if I try to make my extension a standalone extension (using the
    provided
    scope for the
    druid-basic-security
    dependency in my
    pom.xml
    ) I get
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/druid/security/basic/authorization/RoleProvider
    ... however if I shade the jar, then it does work. Also if I build it into
    druid-basic-security
    that also works. Am I missing something with how the interface might be implemented outside of this core extension package? Anyone have experience with this?
    a
    g
    • 3
    • 3
12345Latest