https://linen.dev logo
Join Slack
Powered by
# general
  • d

    Doaa Deeb

    03/04/2025, 1:20 AM
    We are planning to migrate from Azure to S3 for deep storage. After transferring the data from Azure to S3, will Druid be able to load the data from S3 if we configure
    druid.storage.type
    to
    s3
    ?
    m
    b
    • 3
    • 7
  • m

    Maytas Monsereenusorn

    03/04/2025, 6:10 PM
    Does Druid has ‘alias’ support for data source name? We are facing something very similar to this issue https://groups.google.com/g/druid-user/c/_rKQofO4QK8 AFAIK, Druid doesn’t support renaming table. Is it possible to create/update alias name for Druid datasources? Also, the link above mention view support in Druid SQL (e.g: create view data_src as select * from data_src_v1). Does this exist in Druid? I never heard of this….
    ➕ 1
    j
    j
    • 3
    • 13
  • k

    Kevin C.S

    03/07/2025, 4:32 AM
    Hey guys, im using druid 0.17 and Im trying to connect to s3. My VM has role access to s3 so i dont require accessKey or secretKey but i get this error java.lang.RuntimeException: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain.
    • 1
    • 1
  • а

    Алексей Ясинский

    03/11/2025, 10:56 AM
    Hi, I want to restart an Apache Druid cluster consisting of 200 Historical nodes (approximately 800TB of data). How can I ensure that after the restart, the Historical nodes will continue using the segments stored in their local caches rather than reloading them from Deep Storage? Is there a specific time window within which I must restart each Historical node to avoid triggering segment reloads from Deep Storage? Could pausing the Coordinator help prevent unnecessary segment reloads? Additionally, the Main nodes will need to be restarted as well—the entire cluster will be restarted sequentially: Main nodes first, then Query nodes, and finally Data nodes.
    i̇
    k
    • 3
    • 3
  • h

    Hagen Rother

    03/12/2025, 11:11 AM
    Is there any mechanism to allow a dynamic segment granularity yet? I.e specify the beginning and the end of a segment, rather than letting druid decide based on the
    segmentGranularity
    parameter? Since the meta table already has
    start
    and
    end
    as timestamp, such segments should just work ™️ but I don't see how I could create one.
    j
    • 2
    • 27
  • m

    Maytas Monsereenusorn

    03/13/2025, 8:40 PM
    Anyone here has strategy on collecting Druid query lineage? Should we support Openlineage and build something similar to https://trino.io/docs/current/admin/event-listeners-openlineage.html ?
    g
    • 2
    • 4
  • s

    Sam

    03/14/2025, 3:47 PM
    Hi team, I want to confirm the bitmap ability on saving space for nested column (JSON). Say if we have a field called
    attributes
    , and it is a
    JSON
    type. The JSON value has three keys have an identical long string, for instance
    Copy code
    {
      "key1": "same long string",
      "key2": "same long string",
      "key3": "same long string"
    }
    Would Druid only store the
    same long string
    once instead of thrice in the data store to save space? I assumption to this is yes, as Druid should use bitmap index to convert the string into bit, https://druid.apache.org/docs/latest/design/segments#segment-file-structure. But I am not sure it is also true for nested column.
    j
    • 2
    • 5
  • c

    Corwin Lester

    03/18/2025, 3:08 PM
    Is there anyone that works at Imply here, or is another company that provides support for Druid? We've tried contacting Imply multiple times, but haven't gotten a response.
    ✅ 3
    g
    • 2
    • 1
  • a

    Andrea Licata

    03/21/2025, 3:03 PM
    Hi, does anybody knows if the upscaling in RAM of druid goes linearly with the capacity of maxSubQueryrows?
    g
    • 2
    • 1
  • s

    Sivakumar Karthikesan

    03/23/2025, 9:25 AM
    Team .,, in our of the prod cluster where we are seeing latency issue , it takes 4 to 5s to get the result. other datasource works fine and doesnt have any latency. any suggestion please
    Copy code
    select tenantId, systemId, TIMESTAMP_TO_MILLIS(__time) as "timestamp",  sum(iops_pref_pct) as iops_pref_pct  from (select DISTINCT(__time),* from "xyzdatasource" where  systemId='aaajjjjccccc' and __time >= MILLIS_TO_TIMESTAMP(1742252400000) and  __time <= MILLIS_TO_TIMESTAMP(1742338800000)) group by __time, tenantId,systemId order by __time asc
    j
    • 2
    • 1
  • u

    Utkarsh Chaturvedi

    03/24/2025, 8:44 AM
    Hi team. We're trying to test Druid hot and cold tiering setup with 2 historical hot nodes and 1 cold node. We're using the same broker for all the tiers. Currently we're seeing a strange behaviour that only 1 historical hot node is responding with queries. This seems to be incorrect as both the historical hot tiers have data to provide. Can anyone help out with this please.
    j
    • 2
    • 2
  • j

    jose abadi

    04/01/2025, 3:21 AM
    HELLO, LOOKING TO LEARN HOW TO EXTRACT DATA FROM SAP HANA AND SQL USING DRUID, ANYONE WITH EXPERIENCE WANTING TO LEARN. PAID TEACHING SESSEIONS
    g
    • 2
    • 1
  • a

    Abdullah Ömer Yamaç

    04/02/2025, 7:13 PM
    Hello everyone, I am a newbie in Apache Druid and big data. I am worried about the lack of performance. I have 21 columns; most are string, and one is geospatial data. All columns are indexed. My system specs are: 8 core, 32GB RAM and apache setup is single server, micro-quickstart. I have 5.6 billion rows in the data source, and the segment granularity is the hour. The total number of segments is 26200. Here is my query and it takes 4-5 min.
    Copy code
    {
      "queryType": "scan",
      "dataSource": "mobility",
      "intervals": [
        "2024-01-01T00:00:00.000Z/2025-01-01T00:00:00.000Z"
      ],
      "columns": [
        "advertiserid"
      ],
      "filter": {
        "type": "selector",
        "dimension": "advertiserid",
        "value": "0104c0fe-b9b0-6e03-1b7f-f186d7f16b3e"
      }
    }
    Is it normal to take this much time?
    g
    • 2
    • 17
  • h

    HEPBO3AH

    04/03/2025, 9:55 PM
    I have a very overloaded question that will be hard to answer. I know druid excels as analytical storage. The usecase we have is that we need a timeseries database. This will be the primary database and be treated as source of truth. We got regulatory requirement to ensure data correctness, integrity and durability. It will host billions of rows, per day, up to 10 years. Most solution require a very deep dive to uncover some of the limitations which has (very large) cost. Given the scenario, where would the major pain points be if this was druid? For example, my understanding is that there is no WAL, so how does druid recover from a crash?
    g
    a
    m
    • 4
    • 21
  • a

    AR

    04/06/2025, 3:00 PM
    Hi All, We have been facing issues due to the join performance & update limitation in Druid. Finally we experimented with Lookups by excluding them from the realtime ingestion tasks. We used a lookup with ~65K rows and and the join performance is promising. The idea is to store a a delimited string as the lookup value and split it up during run time to apply the necessary filters and fetch the required values using an index. We plan to have around 10-15 lookups with size varying from 50K to 3MM rows. The largest lookup would be ~2GB in size. Memory is not such a big concern as we will be able to bump up the heap size or DM size as necessary. Also, the lookups will only be loaded on demand by a manual trigger (no polling or periodic loading). There are 2 flavours of Lookups in the documentation - Globally Cached Lookups and Single Cached Lookup. Which one should be preferred in this scenario? We tested using the Globally Cached Lookups. Globally Cached Lookups We liked the fact that this lookup allows to filter the records while loading. So we can load data from a single source table into multiple lookups. The implementation seems to be a single Java Concurrent map containing a Concurrent map for each lookup. Is this understanding correct? There seems to be a limit of 10% of heap on the lookup size. Is this per lookup or total size of all the lookups in the "cachedNamespace"? We were also looking at the off heap option. In this case, would all the lookups in the "cachedNamespace" be off heap? Would
    druid.lookup.namespace.numBufferedEntries
    apply per lookup in the "cachedNamespace" - meaning would it create a buffer with 100K entries for each lookup? Single Cached Lookup There isn't much info on how to use this. Would the Lookup APIs work for this type of lookup as well? Finally, the max heap size for the historical process is specified as 24GB in the documentation. Is this a hard limit? As we plan to have some large lookups, can we set the max heap size > 24GB? Thanks, AR.
    g
    • 2
    • 13
  • a

    akshat

    04/07/2025, 4:30 AM
    🚀 We’ve hit a major milestone! 🎉 Our open-source project, nanoservice-ts, has officially crossed 1000 stars on GitHub! 🌟 nanoservice-ts is empowering developers to build lightweight, modular, and scalable backend applications with nanoservices. Whether you're simplifying development, optimizing resource usage, or enhancing workflow flexibility, our framework is designed to make your backend projects easier and more efficient. A big THANK YOU to everyone who has starred, contributed, and supported us along the way! 🙌 If you haven’t checked it out yet, now’s the time: ⭐ Star our GitHub repo: https://github.com/deskree-inc/nanoservice-ts 🐦 Follow us on Twitter: https://x.com/nanoservice_ts 💬 Join the conversation on Discord: https://discord.gg/c4D5uHBn Let’s keep growing the community and pushing the boundaries of nanoservices! 🚀 #opensource #nanoservices #TypeScript #backenddevelopment #developercommunity
    g
    • 2
    • 2
  • a

    AR

    04/08/2025, 4:45 AM
    Hi All, Is there any way to not index a String field when ingesting using MSQ? We have a few fields which have long string values but they will not be used for filtering. They are only needed in the query response. For such fields, is there a way to specify that they should only be stored but not indexed? Thanks, AR.
    g
    • 2
    • 6
  • p

    PHP Dev

    04/09/2025, 12:33 PM
    Hi All, is it possible to ingest data from several Kafka clusters into one dataSource using Kafka supervisors?
    k
    • 2
    • 7
  • s

    Sivakumar Karthikesan

    04/10/2025, 7:10 PM
    Hello Team, does anyone setup Druid web console access via ingress ? without having port-forward approach ? Also apps requires read only access .
  • m

    Master Chatchai

    04/15/2025, 6:52 PM
    👋 Hi everyone!
    🙌 3
    druid 2
  • a

    ahmed grati

    04/23/2025, 4:45 PM
    Hi, What's the difference between
    toolchestMergeBuffersHolders
    and
    mergingQueryRunnerMergeBuffersHolders
    ?
    g
    • 2
    • 3
  • p

    Pooja Shrivastava

    04/24/2025, 2:41 AM
    Hi All
  • p

    Pooja Shrivastava

    04/24/2025, 2:44 AM
    Copy code
    ## Druid Emitting Metrics. ref: <https://druid.apache.org/docs/latest/configuration/index.html#emitting-metrics>
      druid_emitter: http
      #druid_emitter_composing_emitters: '["prometheus","kafka"]'
      #druid_monitoring_emissionPeriod: PT1M
      #druid_emitter_prometheus_strategy: "exporter"
      #druid_emitter_prometheus_port: "9200"
      #druid_emitter_logging_logLevel: debug
      druid_emitter_http_recipientBaseUrl: <http://iptv-druid-exporter.prd.adl.internal/metrics>
      #druid_emitter_http_recipientBaseUrl: <http://druid_exporter_url>:druid_exporter_port/druid
    
      #kafka-emitter config
      druid_emitter_kafka_bootstrap_servers: "10.X.X.X:9092,10.X.X.X:9092,10.X.X.X:9092"
      druid_kafka_security_protocol: "SASL_PLAINTEXT"  # Use "SASL_SSL" if you also need TLS
      druid_emitter_kafka_metric_topic: druid-metric
      druid_emitter_kafka_alert_topic: druid-alert
      druid_emitter_kafka_request_topic: druid-query
      druid_emitter_kafka_clusterName: prd-druid
      # SASL configuration
      druid_emitter_kafka_sasl_mechanism: "PLAIN"
      druid_emitter_kafka_sasl_jaas_config: "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"appuser\" password=\"uJ5551Ax\";"
    
      #druid_emitter_kafka_sasl_jaas_config: "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"admin\" password=\"8I444444z\";"
    
      druid_request_logging_setMDC: "true"
      druid_request_logging_setContextMDC: "true"
      druid_request_logging_nativeQueryLogger: "true"
      druid_request_logging_sqlQueryLogger: "true"
    
    
      #changing logging Details
      #DRUID_LOG_DIR: /var/log/
      org_apache_druid_jetty_RequestLog: DEBUG
      druid_startup_logging_logProperties: "true"
      druid_request_logging_type: emitter #slf4j
      druid_request_logging_feed: feed
      #druid_request_logging_type: file #slf4j
      druid_request_logging_dir: /opt/druid/log/request/
      druid_request_logging_durationToRetain: P2D
      druid_request_logging_filePattern: "yyyy-MM-dd'.log'"
  • p

    Pooja Shrivastava

    04/24/2025, 2:45 AM
    I am setting up emitter metrics for druid. Druid id installed through helm chart version 31.0.2. I am using druid exporter to export the metrics quay.io/opstree/druid-exporter v0.11. I am facing issue to scrape emitter metrics, does anyone know what am I missing out?
    g
    • 2
    • 1
  • p

    Pooja Shrivastava

    04/24/2025, 3:03 AM
    image.png
  • u

    Udit Sharma

    04/30/2025, 4:34 AM
    Hi, Question around projections. • Do i have to define the projection as part of ingestion spec if so why ? i was wondering why not like sql views so that it is available for older segments as well and older segment can build it at the time of loading ? may there are some cons doing that which i might have missed to understand. • if i create a projection and if i want to later delete it how do i do that ? do i have to re-compact with projection spec deleted from it ? • I could not find documentation for trying this out, is it not recommended to use at this moment ?
    k
    g
    c
    • 4
    • 7
  • a

    AR

    04/30/2025, 1:20 PM
    Hi All, In our Druid cluster, we are seeing some odd exceptions when submitting the MSQ jobs through the API
    /druid/v2/sql/task
    . Sometimes we see a "504 Gateway Timeout" exception. Other times we see a "Task [] already exists" exception. We can see the TimeoutException in the router logs as well. But we are unable to see any issue in any of the other services that would give a pointer to why this is happening. Can someone suggest what could be the issue? Druid version: 27.0.0 Thanks, AR.
    g
    • 2
    • 4
  • c

    Cristina Munteanu

    04/30/2025, 7:57 PM
    🎤 Got something exciting to share? The OSACon 2025 CFP is now officially open! 🚀 We're going online Nov 4–5, and we want YOU to be a part of it! Submit your proposal and be a speaker at the leading event for open-source analytics. 👉 Submit here: https://sessionize.com/osacon-2025/
  • j

    JRob

    05/01/2025, 6:46 PM
    Does Druid support read locks? My scenario is I have a supervisor that ingests from kafka (hourly tasks). As soon as data is published, I want to ingest that druid datasource into a new druid datasource. Today, I have to estimate when the realtime ingestion task will be done (i.e. have written everything to historical) and leave enough of a time gap to ensure that everything is written. This is error prone. But if we could somehow tell an ingest task to "not start unless your source datasource has all its data" then our 2nd ingest could be significantly improved.
    g
    j
    a
    • 4
    • 9
  • n

    Nick M

    05/07/2025, 12:01 PM
    What’s the latest thinking around the coordinator scaling and the number of segments per time chunk? A while back, there was some guidance that around 2000 segments per timechunk was the ideal number and we’d sized our ingest granularity accordingly. But I know the coordinator is so much more performant in newer releases and wondered if that guidance still held.
    g
    • 2
    • 1