Apache Druid #dev

Ayhan Kaya

12/14/2024, 11:08 AM

👋 Hello, team!

Ayhan Kaya

12/14/2024, 11:11 AM

Please be informed that, already we have a JAVA application which is deployed on Oracle Weblogic 12c and we need to connect our application to druid via Weblogci JDBC connection. But I couldn't find a library and instruction for Connecting Weblogic to Druid. So appreciated to assist me in this regard

Lucas Capistrant

12/16/2024, 3:44 PM

Hey all, seeking a clarification on our Java support matrix: When we say java 17 is fully supported since druid 27, are we talking strictly about the core druid project code and not about the dependencies we are pinning in our pom files? • For example, we depend on guice 4.2.2 (in master, druid27 is actually guice 4.1.0), which doesn't publicly support java 17 until 5.1.0.

✅ 1

Soman Ullah

12/18/2024, 5:39 PM

Druid supports Long as a basic type for dimensions. But ingesting integer values as dimensions also works which means Druid is casting Integers to Long type somewhere while reading the segment. Could somebody help me understand how this casting works or point to the class which may be responsible for this? Thanks!

Ashwin Tumma

12/31/2024, 11:43 PM

Hi, Could someone help review this PR, https://github.com/apache/druid/pull/17596, for Code Coverage Additions to a few extensions. Thanks!

Ashi Bhardwaj

01/05/2025, 9:38 AM

Hi folks 👋 Can someone please review this PR? Thanks!

Maytas Monsereenusorn

01/07/2025, 8:38 PM

Hi! Could I get reviews on this PR when they have the time? https://github.com/apache/druid/pull/17439 Would be great to have it in v32. Thanks!

Maytas Monsereenusorn

01/18/2025, 1:18 AM

What do people think about emitting a histogram for Druid metric emission? I think currently all the metric values are just numbers

Hazmi

01/20/2025, 10:03 AM

Hey! Could someone please review this PR when they have the time? https://github.com/apache/druid/pull/17646

Suraj Goel

01/27/2025, 5:10 PM

Hi Team, Can someone please review this PR. TIA!

Suraj Goel

01/28/2025, 7:02 AM

Hi Team, Please review this PR to improve S3 upload speed.

info Advisionary

01/30/2025, 11:08 AM

I am working with Apache Druid 31.0.1 and I need to apply a spatial filter using a polygon shape. Specifically, I want to filter data based on whether a point falls within a given polygon. Can anyone provide an example of how to set up and use spatial filters with polygons in Druid? I’ve read through the documentation and tried using various filter options, but I’m having trouble with the correct syntax for defining the polygon in a spatial filter. I would appreciate any examples or pointers on how to structure this in Druid 31.0.1. there is no example of using polygon in spatial filters in druid's official documentation. any help will be appreciated

Maytas Monsereenusorn

01/31/2025, 6:05 AM

Can I get a quick review on https://github.com/apache/druid/pull/17652? Thanks!

Suraj Goel

02/13/2025, 10:13 AM

Hi Team, Please review this PR for bug fix - https://github.com/apache/druid/issues/17722 Related thread - https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1739196991316389 Thanks

Ashwin Tumma

02/21/2025, 1:10 AM

Hi, Can someone please help review these two PRs: https://github.com/apache/druid/pull/17744 https://github.com/apache/druid/pull/17745 Thanks!

Jamie Chapman-Brown

02/27/2025, 6:23 PM

Hi there! I'm the guy who added rabbitmq superstream ingestion to druid. It's been working well for us, thanks for the help getting it integrated! We've been using druid-exporter to get statistics out to Prometheus, and we used to have rabbitmq ingest stats to use in monitoring, like

druid_emitted_metrics{metric_name="ingest-rabbit-lag"}

. We've recently switched to using the Prometheus plugin, but I can't find any rabbit ingest stats. Am I missing anything? Can anyone point me to what I would need to change to get these stats back?

Mikhail Sviatahorau

03/05/2025, 3:51 PM

Hey all! Looking for input on how to prevent compaction tasks from publishing overlapping segments. Here's what happened: • Auto-compaction was running every 30 minutes, compacting daily-partitioned data. • A manual compaction reindexed the same period with a different granularity. • Once the manual compaction finished, some previously scheduled auto-compaction tasks (created before the reindexing) started running. • These tasks compacted data with the old granularity, causing a mix of granularities and overlapping segments (daily segments were created inside of monthly one). • After the data was successfully compacted, things stabilized. For a fix, we need to prevent autocompaction from publishing overlapping segments. We know which segments will be published only at the InputSourceProcessor level on the worker. The indexing task has a

coordinator-issued

prefix, which helps identify auto-compaction and could be used to check and reject publishing segments that don’t match the current cluster state. The problem is that this prefix isn’t accessible on the level where segments are being chosen. The options are to add some preprocessing of the input source at generateAndPublishSegments in IndexTask or to send some flag to the InputSourceProcessor.prosess() method, but both feel like a last resort. Curious to hear your thoughts — if not there, where do you think this could be best handled?

Maytas Monsereenusorn

03/24/2025, 7:07 PM

We can’t include dependency on GNU General Public License v2.0 library right?

Maytas Monsereenusorn

04/16/2025, 9:38 PM

Would it be possible to have the SegmentMetadata queries return distinct values of a column? Similar to how it already supports cardinality, this would just be looking at the keys in the string column dictionaries?

Maytas Monsereenusorn

04/23/2025, 7:36 PM

Do we have any plan on extending query lane to Historical and realtime ingestion task (i.e. Peons)? Having query lane only on the broker makes it less useful. For example, a single very expensive query (which would run on low lane on broker) could/would still takes up all the processing thread on historical and blocks other query from running on those historical. CC: @Clint Wylie (as author of the query lane on the broker). Thanks!!

Abhishek Balaji Radhakrishnan

05/07/2025, 12:31 AM

A fix for the

json_merge()

function when someone gets a chance: https://github.com/apache/druid/pull/17983. Thanks!

✅ 2

Abhishek Balaji Radhakrishnan

05/12/2025, 9:09 PM

Could someone take a look at this fix https://github.com/apache/druid/pull/17997 for the linked issue? Thanks!

✅ 1

Abhishek Balaji Radhakrishnan

05/23/2025, 1:42 AM

Could someone take a look at this fix for a segment unavailability related bug: https://github.com/apache/druid/pull/18025

Maytas Monsereenusorn

05/28/2025, 2:03 AM

Thinking about Threshold prioritization strategy (https://druid.apache.org/docs/latest/configuration/#threshold-prioritization-strategy) and was wondering if the following changes would make sense or not (haven’t tested any of these and don’t know if they will be useful in practise or not) • What if we can stack the violatesThreshold and penalize a query more if it violates multiple threshold. i.e.

Copy code

int toAdjust = 0
    if (violatesPeriodThreshold) {
      toAdjust += adjustment;
    }
    if (violatesDurationThreshold) {
      toAdjust += adjustment;
    }
    if (violatesSegmentThreshold) {
      toAdjust += adjustment;
    }
    if (violatesSegmentRangeThreshold) {
      toAdjust += adjustment;    
    }
    if (toAdjust != 0) {
      final int adjustedPriority = theQuery.context().getPriority() - toAdjust;
      return Optional.of(adjustedPriority);
    }

• What if we can set the adjustment value for each Threshold seperately? i.e.

Copy code

int toAdjust = 0
    if (violatesPeriodThreshold) {
      toAdjust += periodThresholdAdjustment;
    }
    if (violatesDurationThreshold) {
      toAdjust += durationThresholdAdjustment;
    }
    if (violatesSegmentThreshold) {
      toAdjust += segmentThresholdAdjustment;
    }
    if (violatesSegmentRangeThreshold) {
      toAdjust += segmentRangeThresholdAdjustment;    
    }
    if (toAdjust != 0) {
      final int adjustedPriority = theQuery.context().getPriority() - toAdjust;
      return Optional.of(adjustedPriority);
    }

The motivation for the first change is that if a query that violate N thresholds, it should be penalize more (not equal) to another query that violate N-1 thresholds. The motivation for the second change is that some violate are worst than other. i.e. periodThreshold is not that bad compare to segmentRangeThreshold. The prioritization value would then carry over to the Historical and can help with resources prioritization on Historical processing threadpool (related to this discussion https://apachedruidworkspace.slack.com/archives/C030CMF6B70/p1745436989786489). CC:@Gian Merlino @Clint Wylie

Abhishek Balaji Radhakrishnan

05/28/2025, 3:02 AM

Was looking into this feature: https://github.com/apache/druid/pull/13967. I just commented there, but I'm wondering if there are any plans to revive this PR?

Soman Ullah

05/28/2025, 7:15 PM

Hello, does lookup loading disabling work for kafka task? I see it works for MSQ task.

Jesse Tuglu

06/03/2025, 11:11 PM

Hi, wanted to open a thread on what folks think on storing commit metadata for multiple supervisors in the same datasource (e.g the operation here). Currently, Druid stores commit metadata 1:1 with datasource. This update is done either in a shared tx with segment publishing, or in an isolated commit. From what I can see, implementors of

DataSourceMetadata

are solely supervisor-based (either materialized view or seekable stream).

ObjectMetadata

seems to only be used in tests. The way I see it there are ≥ 2 options: • Commit a datasource metadata row per supervisor (likely the easiest, but will take some re-workings on the

SegmentTransactionalInsertAction

API and others, who assume these rows are keyed by

datasource

) – I'm currently doing this and it seems to work fine. • Commit a single row per datasource, storing partitions per supervisor ID and doing merges in the

plus

minus

methods ◦ Something like the payload being: ▪︎ map[supervisor_id] =

SeekableStreamSequenceNumbers

◦ This might suffer from write contention since N supervisors * M tasks per supervisor will be attempting to write new updates in the commit payload to this row in the DB.

Allen Madsen

06/10/2025, 9:39 PM

I could use some help thinking about how to tackle this problem. We have a table that's starting to become too big for a global lookup. We attempted to use a loading lookup, however, the initial queries are too slow. In a query I'm using to test, a fetch for a single record takes ~40ms. The problem is that, there are about 1000 distinct values, which equates to about ~30s of total runtime, because the loading lookup looks up each value independently. When I query all 1000 values together, the total time for the query is ~80ms. At first, I noticed applyAll could be overridden on the LookupExtractor and thought that may be a way to have it batch query lookups. However, I noticed that applyAll is never called. Abstractly, I'd like druid to tell the lookup to prefetch all the values it needs before joining or have the joinCursor iterate in batches and be able to make calls to the database. What are ya'll's thoughts on the best way to approach this problem?

Jesse Tuglu

06/13/2025, 1:46 AM

@Clint Wylie 👋 Do you folks still run Druid historical nodes with transparent huge pages disabled? (Tagging you as I see you were the author of this documentation commit)

Jesse Tuglu

06/17/2025, 7:31 PM

👋 @Zoltan Haindrich, I noticed in the current build that this line points to a personal repo of yours. It seems like during build, maven actually scans that repo for other modules (not just quidem):

Copy code

[INFO] ------------------< org.apache.druid:druid-quidem-ut >------------------
[INFO] Building druid-quidem-ut 34.0.0-SNAPSHOT                         [80/80]
[INFO]   from quidem-ut/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-multi-stage-query/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-datasketches/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-orc-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-parquet-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-avro-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-protobuf-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-s3-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-kinesis-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-azure-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-google-extensions/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-hdfs-storage/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-histogram/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-aws-common/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-processing/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-sql/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-indexing-hadoop/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/mysql-metadata-storage/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-kafka-indexing-service/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-basic-security/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-lookups-cached-global/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/druid-testing-tools/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/extensions/simple-client-sslcontext/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-services/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-server/34.0.0-SNAPSHOT/maven-metadata.xml>
Downloading from datasets: <https://raw.githubusercontent.com/kgyrtkirk/datasets/repo/org/apache/druid/druid-gcp-common/34.0.0-SNAPSHOT/maven-metadata.xml>

Wondering if you knew about this, and whether this was intentional cc @Gian Merlino