Sachin G
09/25/2025, 4:31 AM!/bin/bash
# First find the Imply init script
RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist)
# and add the desired environment variable before starting the Imply processes
sed -i '/^exec.*/i export\ KAFKA_JAAS_CONFIG="org.apache.kafka.common.security.plain.PlainLoginModule required username='\'123434\'' password='\'123\+abcdee\'';"' ${RUNDRUID}Sachin G
09/25/2025, 4:32 AMSachin G
09/25/2025, 4:33 AMSachin G
09/25/2025, 4:33 AM#!/bin/bash
RUNDRUID=$(find /opt/grove -name run-druid | grep -v dist | head -n 1)
if [ -z "$RUNDRUID" ]; then
echo "run-druid script not found."
exit 1
fi
sed -i "/^exec.*/i export KAFKA_JAAS_CONFIG=\"org.apache.kafka.common.security.plain.PlainLoginModule required username='${USERNAME}' password='${PASSWORD}';\"" "$RUNDRUID"Sachin G
09/25/2025, 4:33 AMSachin G
09/25/2025, 4:33 AMSachin G
09/25/2025, 4:35 AMSachin G
09/25/2025, 4:36 AMSachin G
09/25/2025, 4:36 AMSachin G
09/25/2025, 4:36 AMsudo -u grove cat /proc/71938/environ | tr '\0' '\n' | grep KAFKA_JAAS_CONFIG
KAFKA_JAAS_CONFIG=org.apache.kafka.common.security.plain.PlainLoginModule required username='' password='';Danny Wilkins
09/25/2025, 5:27 PMDanny Wilkins
09/25/2025, 5:27 PMTaoufiq Bahalla
09/26/2025, 11:40 AM“The input format to parse the Kafka key only processes the first entry of theDid I miss something? Is there a way to ingest *all Kafka key fields as columns in the Druid datasource*—without copying them into the value? Thanks in advance!field. If your key values are simple strings, you can use theinputFormatformat to parse them. Note that fortsv,tsv, andcsvformats, you need to provide aregexarray to make a valid input format. Only the first one is used, and its name will be ignored in favor ofcolumns.”keyColumnName
Richard Vernon
09/30/2025, 1:25 PMJRob
10/01/2025, 1:39 PMSatya Kuppam
10/02/2025, 1:39 PM34.0.0 version:
• I have a query with a single JOIN and I keep running into Not enough memory issues (see 🧵 for the query, datasource and task run detail).
• The query fails in the sortMergeJoin phase. We have two historical pods with 64vCPU and 512gigs of memory with -Xmx=107g.
• From the Dart documentation its not clear how I can capacity plan for this query or if its possible to run this query successfully at all.
◦ does Dart spill to disk in the join phase? Would that potentially be the problem here?Jvalant Patel
10/03/2025, 12:58 AMorg.apache.druid.server.metrics.QueryCountStatsMonitor monitor for Peon ingestion processes running on Middle manager nodes ?Soman Ullah
10/03/2025, 5:23 PMREPLACE INTO "test-ds" OVERWRITE WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY but it gave the following error:
Invalid OVERWRITE WHERE clause [`__time` >= CURRENT_TIMESTAMP - INTERVAL '1' DAY]: Cannot get a timestamp from sql expression [CURRENT_TIMESTAMP - INTERVAL '1' DAY]Danny Wilkins
10/06/2025, 3:26 PMJRob
10/07/2025, 7:55 PMCannot construct instance of `org.apache.druid.data.input.protobuf.SchemaRegistryBasedProtobufBytesDecoder`, problem: io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
The instructions here seem to be wrong: https://druid.apache.org/docs/latest/development/extensions-core/protobuf/#when-using-schema-registry
(there is no extension-core folder and creating it didn't fix the issue)
I also tried placing the jars to extensions/protobuf-extensions and extensions/druid-protobuf-extensions but still no luck...Sanjay Dowerah
10/08/2025, 8:06 AMMaytas Monsereenusorn
10/11/2025, 8:42 PMAdithya Shetty
10/13/2025, 6:30 PMAdithya Shetty
10/13/2025, 6:37 PMexportResults, but notice that files exported to s3 have truncated rows. When same MSQ queries are configured to export results locally, we are able to see all rows in all files.
Below are druid properties configured for MSQ setup. Please check and provide pointers to fix the issue. Thanks in advance
druid.extensions.loadList: '["/opt/druid/extensions/maha-druid-lookups","/opt/druid/extensions/druid-datasketches", "/opt/druid/extensions/druid-avro-extensions","/opt/druid/extensions/druid-s3-extensions", "/opt/druid/extensions/mysql-metadata-storage", "/opt/druid/extensions/druid-kafka-indexing-service","/opt/druid/extensions/druid-orc-extensions","/opt/druid/extensions/druid-multi-stage-query","/opt/druid/extensions/statsd-emitter"]'
"druid.msq.intermediate.storage.enable": "true",
"druid.msq.intermediate.storage.type": "s3",
"druid.msq.intermediate.storage.bucket": "{{aws_account_id}}-demand-reporting-druid-prod-{{ aws_region}}",
"druid.msq.intermediate.storage.prefix": "reports",
druid.msq.intermediate.storage.tempDir: '/data/msq'
druid.export.storage.s3.tempLocalDir: '/tmp/msq'
druid.export.storage.s3.allowedExportPaths: '["<s3://demand-reporting-asyncreports-druid-prod/export/>"]'
druid.export.storage.s3.chunkSize: 500MiBRichard Vernon
10/14/2025, 4:17 PMVictoria
10/21/2025, 5:51 PMMaytas Monsereenusorn
10/25/2025, 12:54 AMWony
10/28/2025, 3:06 PMORDER BY clause to my GROUP BY aggregation query is incorrectly changing the calculated results.
The Scenario:
I am running a query to count total, skipped, and responded answers for a set of questions.
Query without ORDER BY (Correct Results):
When I run the aggregation without sorting, I get the expected results.
SQL
SELECT
question_id AS QUESTION_ID,
COUNT(*) count_row,
SUM(count_responses) AS TOTAL_RESPONSES,
SUM(
CASE
WHEN option_id IS NULL OR option_id = '' THEN count_responses
ELSE 0
END
) AS SKIPPED,
SUM(
CASE
WHEN option_id IS NOT NULL AND option_id != '' THEN count_responses
ELSE 0
END
) AS RESPONDED
FROM daily_ceu_question_response
WHERE
account_id = 'dffdc481-a01f-4051-8d3b-971a925bae14'
AND event_date >= '2025-09-01'
AND event_date <= '2025-09-30'
GROUP BY
question_id
For a specific question_id, the output is correct:
• `count_row`: 6
• `TOTAL_RESPONSES`: 6
• `SKIPPED`: 3
• `RESPONDED`: 3
Query with ORDER BY (Incorrect Results):
However, when I add ORDER BY SKIPPED DESC to the end of the exact same query, the results for that specific question_id become incorrect:
SQL
-- Same query as above, with this line added at the end:
ORDER BY SKIPPED DESC
The output for the same question_id (066a8c94-...-bac7d1) changes to:
• `count_row`: 5
• `TOTAL_RESPONSES`: 5
• `SKIPPED`: 3
• `RESPONDED`: 2
One of the "responded" rows seems to disappear from the aggregation, causing the counts to be wrong.
This behavior seems like a bug, as an ORDER BY clause should only sort the final result set.
Is this a known issue, or something I should open a bug report for on GitHub? Any guidance would be much appreciated ❤️.
I am using Druid version 28.0.1
I've attached the three screenshots showing the raw data and the different query results.
Thanks for your help!JRob
10/30/2025, 4:50 PMUtkarsh Chaturvedi
10/31/2025, 4:26 AMtieredReplicants is set higher than the number of historicals in a tier.
Setup example:
• Tier has 3 historicals
• Datasource configured with tieredReplicants: 5
Question: What actually happens in this case?
1. Does Druid cap the replicas at 3 (one per historical)?
2. Can a single historical load multiple copies of the same segment to satisfy the replication factor?
3. Does it fail/warn/queue the additional replicas?
I couldn't find explicit documentation about this edge case. The architecture seems designed to distribute segments across different historicals, but I want to confirm the actual behavior when requested replicas exceed available nodes.
Has anyone tested this scenario or can point me to the relevant code/docs that clarifies this?
Thanks!