https://linen.dev logo
Join SlackCommunities
Powered by
# general
  • s

    Suraj Goel

    01/17/2025, 5:11 AM
    Hi Team, We are planning to upgrade from Druid 25 to Druid 30. Is it safe to upgrade directly to 30. Is there any recommendation for the upgrade process ?
  • m

    Mohit Dhingra

    01/21/2025, 8:42 AM
    Hi Team, org.apache.druid.server.metrics.TaskSlotCountStatsMonitor already configured in runtime properties of overload. Still taskSlot/used/count metrics are not being sent by Druid. Any suggestions ?
  • s

    Sam

    01/21/2025, 10:56 PM
    Hi team, I want to know when to use nested columns and when to use flat nested columns. From the Druid document, https://druid.apache.org/docs/latest/querying/nested-columns/
    An optimized virtual column allows Druid to read and filter these values at speeds consistent with standard Druid LONG, DOUBLE, and STRING columns.
    It looks like we don't need to flatten the nested columns since the performance may be similar. Is there a case of not using the nested columns but flattening the nested field?
    j
    • 2
    • 2
  • a

    AR

    01/22/2025, 4:30 PM
    Team - We have several dictionaries but don't want to load all of them in each realtime ingestion task as they take up unnecessary memory and we need to increase the heap size every time a new dictionary is added. Is there a way to avoid loading dictionaries or load them selectively in realtime ingestion tasks? The batch ingestion tasks (classic or MSQ) do not load the dictionaries since they don't support querying. Is this understanding correct? Thanks, AR.
    h
    k
    • 3
    • 21
  • g

    Giorgio Pellero

    01/22/2025, 7:06 PM
    hey team! very new to Druid - we're testing it out and I'm trying to get up to speed with it. so far it's been great! 🙂 I've got a question I haven't been able to find an answer for in the official docs: it is possible to
    UNNEST
    a nested JSON column at ingestion time?
    to be more precise, I'm using streaming ingestion from Kafka and each source row has a column that looks like this (simplified):
    {"items": [{"key": "item1", "value": {...}}, {"key": "item2", "value": {...}}, ..., {"key": "itemN", "value": {...}}]}
    - that is,
    items
    is not constant-sized. to make the data easier to work with at query time I'd like to
    UNNEST
    it so that it results in something like this:
    Copy code
    __time,item_key,item_value
    1,item1,`{...}`
    2,item2,`{...}`
    ...
    N,itemN,`{...}`
    where
    item_value
    is a
    COMPLEX<json>
    . I can easily do this when batch ingesting using SQL, for example:
    Copy code
    INSERT INTO "my_new_table"
    SELECT
      __time,
      JSON_VALUE(item, '$.key') AS item_key,
      JSON_VALUE(item, '$.value') AS item_value
    FROM "original_table"
    CROSS JOIN UNNEST(JSON_QUERY_ARRAY(original_column, '$.items')) AS item
    PARTITIONED BY ...
    so is this sort of unnesting possible for streaming ingestion?
    j
    d
    • 3
    • 8
  • d

    David Adams

    01/23/2025, 4:54 AM
    The document speaks of caching on Historicals and on Brokers and how these are kept in the heap. Easy enough, since it's the process heap. However, when doing streaming ingestion on Middle Managers, I'm a bit confused on where it logically resides. Does the realtime cache exist: 1. On the central process heap (one cache per node, requiring
    cacheSize
    space) 2. On the task heap (one cache per task, requiring
    cacheSize * tasks
    space) I suspect it's on the task heap, but I want to double check with someone more familiar before diving into the source code for a verdict.
    k
    h
    • 3
    • 3
  • s

    Slackbot

    01/23/2025, 2:03 PM
    This message was deleted.
    👀 1
    g
    • 2
    • 1
  • o

    oscar de la cruz

    01/24/2025, 10:34 PM
    hello everybody, i'm just started to use druid. And can not initialize postgresql-metadata-storage. Can somebody help with this issued? I'm trying to deploy druid using docker
    h
    • 2
    • 1
  • a

    Andrew Ho

    01/27/2025, 10:39 PM
    Hi Druid experts! Wanted to get some feedback around a potential behavior change for the /druid/indexer/v1/supervisor endpoint. The current behavior when an ingestion spec is submitted is to stop then start the supervisor. Regardless of whether the underlying spec actually changed, the supervisor is always restarted. The behavior change we wanted to introduce is to first check if the spec has changed. If it hasn't, then do nothing. Otherwise just proceed with the existing behavior. Our use case is that we have some automation which hits the supervisor endpoint to update the schema in the ingestion spec. It's often unclear which specs have changed, so we would like to be able to just submit all of them to the supervisor endpoint and have it be restarted only if the spec has changed. Please let me know your thoughts, and happy to contribute if we think this is valid behavior
    s
    j
    +2
    • 5
    • 6
  • s

    Siva praneeth Alli

    01/29/2025, 6:04 PM
    Hi Druid experts, for kafka indexing task, when intermediateHandoffPeriod < taskDuration, then task hand's off segment early. During handoff, does ingestion pause until handoff is complete?
    j
    • 2
    • 2
  • s

    Siva praneeth Alli

    01/30/2025, 5:30 PM
    Good morning folks, a question regarding MVCC. When an SQL ingestion spec uses
    REPLACE <table> OVERWRITE WHERE
    and given that my table already has data for a time chunk for which there is no data in my input source(but there is data for other time chunks), then when (1) new segments are build and (2) old segments are deleted (since i dont have data in my input source for some existing time chunk) then: Is the visibility of new segments and deletion of old segments atomic? Or for some time do both old segments(to be deleted) and new segments returns data since deleted segments and new segments are for different time chunks? TLDR, does MVCC apply for data updates or for deletions as well?
    r
    • 2
    • 1
  • c

    Carlos M

    02/03/2025, 7:34 PM
    Hello, I was reading the documentation and noticed the entry on the
    basic-cluster-tuning
    saying
    Having a heap that is too large can result in excessively long GC collection pauses, the ~24GiB upper limit is imposed to avoid this.
    I remember that line was there from really early versions of druid when only Java 8 was supported; is that still the case for Java 17?
  • k

    Kiran Kumar Puttakota

    02/10/2025, 9:22 AM
    Hello all,
  • k

    Kiran Kumar Puttakota

    02/10/2025, 9:22 AM
    Anyone please answer my question, How to remove the role access to a specific user using API.. Thanks
    k
    • 2
    • 3
  • k

    Kiran Kumar Puttakota

    02/10/2025, 10:14 AM
    Ok, But you sent the role assignment to a user, I want a role to remove from the user. How can I do that ??
    l
    • 2
    • 1
  • j

    JRob

    02/11/2025, 6:49 PM
    We're running into deep storage as a limitation for growing our cluster. Has anyone else run into this and/or have some solutions? Our deep storage is a 17 TB mount and our turnover is ~ 3 TB / day.
    h
    • 2
    • 3
  • n

    Nimrod Lahav

    02/12/2025, 9:38 AM
    Hello, were trying to adopt druid in my org but security team is blocking us from installing since image scans are returning the following CVE's in latest images (and older versions are even worse)
    Copy code
    LOW       7
      MEDIUM    51
      HIGH      44
      CRITICAL  9
    see attached CVE report did anyone face this issue? has a security report I can share that has some explanations / suppression lists
    druid-32.0.0.txt
    b
    • 2
    • 5
  • t

    Test-Bibek

    02/12/2025, 11:02 AM
    added an integration to this channel: Test-Bibek
  • t

    Test-Bibek

    02/12/2025, 11:04 AM
    This is a test notification for "Ingestion-test" Alert summary would go here Bibek Sahoo
  • t

    Test-Bibek

    02/12/2025, 11:15 AM
    removed an integration from this channel: Test-Bibek
  • s

    Sam

    02/13/2025, 2:01 AM
    Hi team, is there a way to let Druid mimic the
    rate
    method from Prometheus to calculate the rate per minute of a counter value while handing the counter resets?
    • 1
    • 1
  • s

    Sam

    02/13/2025, 2:03 AM
    I didn't find a native method that can do this. Does this mean Druid favors delta data (like gauge) instead of a counter?
  • y

    ymcao

    02/17/2025, 3:29 AM
    Hi team, when deploying Druid on AWS, is it common practice to run both Historical and MiddleManager services on the same I-series machine? Additionally, I’ve noticed that the Bottlerocket OS is not supported on I-series(instance-store) models. Are there any solutions or workarounds for this? Any insights would be greatly appreciated. Thank you! ❤️
    m
    • 2
    • 2
  • s

    Shubham Pratik

    02/17/2025, 7:53 AM
    Hi Team, Anyone did Keycloak integration with druid?
    k
    • 2
    • 1
  • s

    Shubham Pratik

    02/17/2025, 7:55 AM
    Hi Team, Anyone tried AWS secret manger for postgres credentials
  • j

    Julian Larralde

    02/17/2025, 11:12 AM
    Is there any analysis done on Druid native-batch ingestion performance that compares JSON(zipped)/Parquet/Avro ? Which is the most performant one?
  • t

    Tarun Kancherla Chowdary

    02/24/2025, 7:20 AM
    Hi Team, I'm Tarun.. I'm new here.. Looking forward for interaction and valuable insights
  • s

    Samarth Jain

    02/25/2025, 8:54 AM
    It looks like results of the SCAN query type, both whole query cache and intermediate segment level results, are not cached . Looking at the
    getCacheStrategy()
    method it doesn't have its own implementation and so is returning null from the default implementation. As a result,
    isQueryCacheable()
    always returns false for this query type at here. Is there a reason why scan query results are not cached?
    g
    • 2
    • 7
  • a

    AR

    02/25/2025, 1:38 PM
    Hi All, If we set a broadcast forever rule for a table, does it push the table down to the historicals as well? Document mentions about brokers not about the historicals. If the retention for a table is set as "broadcast forever", would a direct join with it run directly on the historical rather than pulling in the data to the broker to perform the join? We are looking to improve the join performance in native queries as opposed to repeated re-ingestion to write the updated data. Separately, I chanced upon the below article on "indexed tables" in Druid. Is this something that has been abandoned? https://support.imply.io/hc/en-us/articles/360051201993-Druid-indexed-tables-alpha Thanks, AR.
    j
    • 2
    • 12
  • j

    JRob

    02/25/2025, 9:14 PM
    Is it possible for Druid to generate multiple rows per events? I have a json structure like:
    Copy code
    {
      "Timestamp": "2025-02-24T14:13",
      "Results": [
        {
          "Key": "AAA",
          "Value": 35
        },
        {
          "Key": "BBB",
          "Value": 44
        }
      ]
    }
    I'd like to end up with rows aggregated hourly like:
    Copy code
    __time,           Key, sum_Value
    2025-02-24T14:00, AAA,     33675
    2025-02-24T14:00, BBB,     44876
    Or at the very least be able to run queries like:
    Copy code
    SELECT Key, SUM(sum_Value)
    FROM datasource
    WHERE __time >= CURRENT_TIMESTAMP - INTERVAL 1 DAY
    GROUP BY 1
    The important point is there is a json list containing dimensions and metrics. I would like to group by the dimensions in that list while aggregating the associated metrics. I'm expecting peak load around 600 K events / second. Storing the metrics without aggregation is not ideal.
    j
    • 2
    • 2
1...272829...35Latest