Apache Druid #dev

Join Slack

Channels

main

ask-struct-ai

data-sketches

dev

docs-and-training

equinix-imply-external

Maytas Monsereenusorn

06/15/2024, 1:01 AM

Will having more GC metrics be useful for other people here? i.e. • Max size of old generation memory pool • Size of old generation memory pool after a full GC • gc promotion rate (Incremented for any positive increases in the size of the old generation memory pool before GC to after GC) • gc allocation rate (Incremented for the increase in the size of the young generation memory pool after one GC to before the next) • Pause time due to GC event • Time spent in concurrent phases of GC

Kumar Basapuram

07/02/2024, 10:00 AM

Hello team, Do we support Druid latest version source code compilation support for jdk-11.?

Atul Mohan

07/02/2024, 10:00 PM

I think I see an issue while running segment metadata queries against realtime segments containing an HLLSketch column. The

errorMessage

field in the query result gives:

Copy code

cannot_merge_diff_types: [HLLSketch] and [HLLSketchBuild]

The complex type for HLL is

HLLSketchBuild

in IncrementalIndexStorageAdapter but for persisted segments, the complex type is

HLLSketch

and this causes a mismatch during the columnanalysis merge phase. Any ideas on how this can be fixed?

Soman Ullah

07/09/2024, 2:56 AM

Is Druid via HTTP API more resilient than JDBC driver API? For higher qps, I often see the following failure from JDBC:

Copy code

2024-07-01T10:01:05,378 ERROR [qtp13497839-316] org.apache.druid.sql.avatica.DruidMeta - No such connection: eqq122-cvbbge-5564-lkjh8-4ffljfrsds

Krishna Thirumalasetty

07/18/2024, 10:53 PM

In the Druid Basic Cluster Tuning guide: https://druid.apache.org/docs/latest/operations/basic-cluster-tuning/#total-memory-usage under “Total Memory Usage” section, there is a statement:

Copy code

The Historical will use any available free system memory (i.e., memory not used by the Historical JVM and heap/direct memory buffers or other processes on the system) for memory-mapping of segments on disk.

What does “Free System Memory” corelate to, in terms of the output of

free -m

command. Is “Free System Memory” => “FREE” or “Cache/Buffer Memory”?

Soman Ullah

07/23/2024, 9:30 PM

Hello, I follow Step 2 from this imply blog: https://imply.io/blog/upserts-and-data-deduplication-with-druid/ and found that

latest

sql command is slow(5-6 seconds). Any ideas on how to improve its performance?

Abhishek Agarwal

07/26/2024, 8:13 AM

Cross-posting. Also, if you have an idea for a talk and need help in shaping the proposal, I will be happy to help.

Jakob Riebe

08/02/2024, 2:10 PM

Hi Druid Team, Are there any plans to allow using the multiphase segment merging strategy (

IndexMergerV9.multiphaseMerge

- src) when publishing segments from stream ingestion (e.g. kafka)? This strategy can be configured in batch ingestion and compaction by setting

maxColumnsToMerge!=-1

but not for stream ingestion. I took a look at the relevant code section (StreamAppenderator.mergeAndPush) and it appears that this is already prepared:

Copy code

mergedFile = indexMerger.mergeQueryableIndex(
            indexes,
            schema.getGranularitySpec().isRollup(),
            schema.getAggregators(),
            schema.getDimensionsSpec(),
            mergedTarget,
            tuningConfig.getIndexSpec(),
            tuningConfig.getIndexSpecForIntermediatePersists(),
            new BaseProgressIndicator(),
            tuningConfig.getSegmentWriteOutMediumFactory(),
            tuningConfig.getMaxColumnsToMerge()  // <-- always -1 for stream ingestion tasks (default implementation in `AppenderatorConfig` is never overridden)
        );

So basically this would "only" require to make

maxColumnsToMerge

configurable in the respective

xxxTaskTuningConfig

for kafka/kinesis/rabbitmq/etc. and to update the UI (API/WebConsole). Are there any reasons against using multiphase merge in stream ingestion at all or is this simply not (yet) implemented? Thanks in advance!

Gian Merlino

08/07/2024, 8:51 PM

Registration is open for Druid Summit 2024!! 🚀 It will be in-person this year, on October 22 in Redwood City, CA in the SF Bay Area. Registration is open here: https://druidsummit.org/. There is a ticket price, but I have some complimentary tickets to offer for folks here on Slack… DM me for a code 😄 I hope to see many of you there!

🙌 1

Hugh Evans

08/14/2024, 3:47 PM

Hi folks, Would anyone working on pydruid be up for me picking their brains about how we could potentially get some of the features we've developed for nice jupyter notebook integration contributed into pydruid? We've got our own version of the API internally within dev rel and it seems a shame to not contribute some of the stuff we've got to OSS - just wanted to check in as it looks like things might be quiet on the project at the moment

Eyal Yurman

08/15/2024, 4:22 PM

The AWS SDK package we use (1.x) is deprecated, we need to migrate to the new package (2.x). More details here: https://github.com/apache/druid/issues/16903

Eyal Yurman

08/20/2024, 8:22 PM

Before I go through each release notes.... Do you know if version 0.12 segments in deep storage will be seamlessly read by version 30.0?

Hardik Bajaj

09/09/2024, 5:49 AM

Hey Druid Team! I have some doubts on

indexing-service

that can help me in fixing this issue. I am not able to find how Druid makes sure that replica tasks consuming from same partitions are made sure to not get scheduled on same workers. I don't find any patch preventing it in TaskMaster, TaskRunner or in WorkerSelectStrategy. I'm assuming as replica tasks are next to each other in TaskQueue, this would prevent to get scheduled on same workers (I might be wrong). I'm asking this because, the issue I pointed gets triggered when let's say on a worker, Task -> A, Task Group -> G moves to PUBLISHING and a new actively reading task -> B consuming from same TaskGroup -> G get scheduled on same worker, task A's StreamAppenderator thread

-appenderator-abandon

and

TASK]-publish

is not able to terminate. Is there any affinity on these appenderator threads to the TaskGroup or partitions which are preventing them from terminate ? This increases the probability of the issue occuring if replicas are increased and workers are decreased, active task gets failed when we reach this state. Any help on the open questions above would be appreciated. Thanks!

Utkarsh Chaturvedi

09/13/2024, 5:24 AM

Hey Druids! What profile should I be building the project that also packages the community extensions?

Samarth Jain

09/13/2024, 11:54 PM

I was thinking of adding an emitter of sorts that would publish an array of datasource -> [columns used] for every Druid query executed. The idea ultimately is to figure out what all dimensions and metrics are not used so that we can tell users to remove them to ultimately reduce data size and improve query performance. I can see how this possibly could be extended to also include the query granularity, filter columns etc. What would be a good place to capture and publish this kind of information? We obviously want to do this after the query has been validated.

Vaibhav Kumar

09/16/2024, 6:32 PM

Hello, I am new to druid community and looking for some good starter issue for contributions. I could see a few on https://github.com/apache/druid/issues?q=is%3Aopen+is%3Aissue+label%3AStarter but not sure what would the right one for me to pick up. Can someone help me with it?

Courage Noko

09/16/2024, 7:14 PM

Hey Druid team! Imply and Spotify worked on a gRPC extension, we would like committers to review this PR. What is the general process for such requests?

Hardik Bajaj

09/24/2024, 6:27 PM

Hey! Can someone please review this PR that is a potential fix for https://github.com/apache/druid/issues/16783 cc: @Amatya Avadhanula @kfaraz @Abhishek Agarwal

Evan Rusackas

09/27/2024, 7:43 PM

Wondering if any Druid PMC members (or anyone particularly knowledgable regarding the history/use-cases/community/roadmaps around here) might be interested in joining a couple of us from the Apache Superset project on a "Designated Driver" podcast where we talk about databases and BI over a beer. DM me if interested!

Karan Kumar

10/02/2024, 2:40 AM

@kfaraz Starting 🧵 to discuss https://github.com/apache/druid/pull/16889#discussion_r1783774321

Shekhar Rajak

10/14/2024, 6:15 PM

Hi team, I was looking into https://github.com/druid-io/pydruid , do we have more documents for connecting to flink as source and have analytics using druid in python ?

Ashwin Tumma

10/17/2024, 5:02 AM

Hi @Adarsh Sanjeev, Can you kindly help me review this PR https://github.com/apache/druid/pull/17362, for small addition to code coverage and fixing a code smell in Prometheus Emitter. Thanks!

✅ 1

Shekhar Rajak

10/19/2024, 12:34 AM

Hi team, anyone have came across similar error while building ? https://github.com/apache/druid/issues/17375

Shekhar Rajak

10/22/2024, 4:37 AM

Hi team, I am exploring druid-iceberg extensiion usinghive catalog : https://druid.apache.org/docs/latest/development/extensions-contrib/iceberg - can you anyone help me redirecting hive catalog examples and how we can query and load iceberg table by configuring hive (or rest catalog ) ? Thanks!

Shekhar Rajak

10/22/2024, 6:50 PM

Hi team, Please help in testing the PR for glue catalog support: https://github.com/apache/druid/pull/17392

victor regalado

10/24/2024, 2:53 AM

Hey i have a PR with a small change for pydruid client to enable support for MSQ engine on the db API. Can someone take a look ? Thank you

Suraj Goel

10/24/2024, 3:03 PM

Hi Team. I have a PR open from some days. Can someone please review it. TIA !

Ashwin Tumma

10/25/2024, 2:56 AM

Hi, I have a small PR for documentation update; https://github.com/apache/druid/pull/17409 ; can someone help review it? Thanks!

Kumar Basapuram

11/05/2024, 4:54 AM

Do we support

druid.emitter

for

composing

type for

parametrized

with

dropwizard

together.?

Shekhar Rajak

11/11/2024, 3:57 AM

Hi team, do we have a direct way to connect to flink as Source ? I usually see examples flink->kafka->druid. Please share any reference/discussions on this.