https://linen.dev logo
Join Slack
Powered by
# kubernetes-druid
  • c

    Carlos M

    09/03/2024, 5:59 PM
    Hello! is there a way to disable the use of kubexit and args injections when multicontainers are used? some other containers’ args are being modified by the extention
    kubernetes-overlord-extensions
    but not the image anyway making them fail to start.
  • r

    Rahul Sharma

    09/04/2024, 10:24 AM
    🚀 Introducing MIDAS: The Intelligent Autoscaler for Druid’s MiddleManager! 🚀 Hey Druid community! 🌟 Ever felt the pain of wrestling with scaling Druid’s MiddleManager service on Kubernetes? I did too—until I built MIDAS (Middle Manager Intelligent Dynamic Autoscaler)💡 You may have tried using HPA(Horizontal Pod Scaling) to autoscale your MiddleManagers, you might have faced issues like scaling down even when tasks are running, or spinning up more MiddleManagers than necessary. 😅 But with MIDAS, you can autoscale your MiddleManagers without any ingestion failures, ensuring that only the resources required to fulfill ingestion requests are used.🕺🏻 Using MIDAS we have cut our Druid infrastructure costs by 30%🤘🏻. Now, we can scale Druid’s MiddleManagers without worrying about unexpected ingestion failures, and we’re paying only for the resources we actually use🥳 Curious about how MIDAS can turn your chaotic infrastructure into a well-oiled machine? Check out my latest article on Druid Autoscaling: Medium 🔗 The best part? MIDAS is open-sourced! 🎉 Feel free to explore, contribute, or just marvel at how it can revolutionize your Druid deployments. Let’s make Druid even more powerful together! 💪. Repository 🔗 Looking forward to your thoughts and feedback! Feel free to connect with me on LinkedIn 🔗 Cheers, Rahul
    👏 3
    ⭐ 1
    a
    • 2
    • 2
  • t

    Tapajit Chandra Paul

    09/11/2024, 12:14 PM
    Hello everyone! I am trying to configure
    TLS
    in druid. I have provided the following TLS configurtion in the
    common.runtime.properties
    according to the example below: https://druid.apache.org/docs/latest/operations/security-overview/#update-druid-tls-configurations
    Copy code
    druid.enablePlaintextPort=false
    druid.enableTlsPort=true
    
    druid.client.https.protocol=TLSv1.2
    druid.client.https.trustStorePassword={"type": "environment", "variable": "DRUID_KEY_STORE_PASSWORD"}
    druid.client.https.trustStorePath=/opt/druid/ssl/client/truststore.jks
    druid.client.https.trustStoreType=jks
    
    druid.server.https.certAlias=druid
    druid.server.https.keyStorePassword={"type": "environment", "variable": "DRUID_KEY_STORE_PASSWORD"}
    druid.server.https.keyStorePath=/opt/druid/ssl/server/keystore.jks
    druid.server.https.keyStoreType=jks
    But when I access routers in
    9088
    port, I get the error shown in the screenshot. It shows that
    broker
    is not running. But all the services are actually running, including broker. I think router cannot reach the broker or other services. In my opinion, there is some issue with my configurations. Can anyone kindly tell me what am I doing wrong? FYI, when I remove the TLS config provided above from the
    common.runtime.properties
    , there is no error and druid cluster runs seamlessly. This is what the router's log looks like:
    Copy code
    2024-09-11T09:44:28,513 WARN [CoordinatorRuleManager-Exec--0] org.apache.druid.discovery.DruidLeaderClient - Request[<https://10.244.0.146:8281/druid/coordinator/v1/rules>] failed.
    org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
    	at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:134) ~[druid-processing-30.0.0.jar:30.0.0]
    	at org.apache.druid.java.util.http.client.AbstractHttpClient.go(AbstractHttpClient.java:33) ~[druid-processing-30.0.0.jar:30.0.0]
    	at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:158) ~[druid-server-30.0.0.jar:30.0.0]
    	at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:133) ~[druid-server-30.0.0.jar:30.0.0]
    	at org.apache.druid.server.router.CoordinatorRuleManager.poll(CoordinatorRuleManager.java:137) ~[druid-services-30.0.0.jar:30.0.0]
    	at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:55) [druid-processing-30.0.0.jar:30.0.0]
    	at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:51) [druid-processing-30.0.0.jar:30.0.0]
    	at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$2.run(ScheduledExecutors.java:97) [druid-processing-30.0.0.jar:30.0.0]
    	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
    	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
    Caused by: org.jboss.netty.channel.ChannelException: Failed to handshake with host[<https://10.244.0.146:8281>]
    	at org.apache.druid.java.util.http.client.pool.ChannelResourceFactory$2$1.operationComplete(ChannelResourceFactory.java:245) ~[druid-processing-30.0.0.jar:30.0.0]
    	at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:395) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1461) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1315) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.6.Final.jar:?]
    	... 3 more
    Caused by: javax.net.ssl.SSLHandshakeException: No subject alternative names present
    	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
    	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:378) ~[?:?]
    	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321) ~[?:?]
    	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:316) ~[?:?]
    	at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654) ~[?:?]
    	at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473) ~[?:?]
    	at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369) ~[?:?]
    	at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) ~[?:?]
    	at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) ~[?:?]
    	at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1277) ~[?:?]
    	at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1264) ~[?:?]
    	at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) ~[?:?]
    	at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1209) ~[?:?]
    	at org.jboss.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1393) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1256) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?]
    	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.6.Final.jar:?]
    	... 3 more
    
    Caused by: java.security.cert.CertificateException: No subject alternative names present
    	at java.base/sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:142) ~[?:?]
    	at java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:101) ~[?:?]
    	at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:458) ~[?:?]
    	at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:432) ~[?:?]
    	at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292) ~[?:?]
    	at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) ~[?:?]
    	at org.apache.druid.server.security.DefaultTLSCertificateChecker.checkServer(DefaultTLSCertificateChecker.java:52) ~[druid-server-30.0.0.jar:30.0.0]
    	at org.apache.druid.server.security.CustomCheckX509TrustManager.checkServerTrusted(CustomCheckX509TrustManager.java:109) ~[druid-server-30.0.0.jar:30.0.0]
    	at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:632) ~[?:?]
    	... 23 more
    s
    • 2
    • 10
  • r

    rishikesh

    09/22/2024, 7:11 AM
    Hello everyone! I am trying to click on the supervisor and Refresh button after login in druid, We are encountering the error messages "Could not get status" and "Could not get stats" during API calls to the Druid backend. This issue arises due to a 5-second timeout for response returns, which was introduced in the newer version of Druid30.0.0. https://druid.apache.org/docs/latest/api-reference/supervisor-api/
    b
    • 2
    • 2
  • c

    Carlos M

    10/02/2024, 6:54 PM
    Hello, while using the kubernetes extension with middle managers enabled, is it ok/relevant to set
    druid.zk.service.compress
    to false? sometimes the MM would dissapear and when they return the Failed task would have a message
    The worker that this task was assigned disappeared and did not report cleanup within timeout[PT15M]....
  • c

    Carlos M

    10/04/2024, 10:29 PM
    where can I find an API map of the functions from
    druid-kubernetes-extensions
    ?
  • y

    Yurii I

    10/16/2024, 9:51 AM
    Hi folks, I would very much appreciate it if you route me, on how to properly enable the PostgreSQL driver in the druid via druid-operator.
    ✅ 1
    • 1
    • 9
  • t

    Tapajit Chandra Paul

    10/21/2024, 9:59 AM
    Hello everyone, I am having smome issues with druid's logging. I get the following log when I see the log of the pod of the druid cluster:
    Copy code
    $ kubectl logs -n demo druid-with-config-brokers-0
    Defaulted container "druid" out of: druid, init-druid (init)
    2024-10-21T09:18:45+00:00 startup service broker
    Setting druid.host=10.244.0.108 in /tmp/conf/druid/cluster/query/broker/runtime.properties
    DEBUG StatusLogger Using ShutdownCallbackRegistry class org.apache.druid.common.config.Log4jShutdown
    TRACE StatusLogger Log4jLoggerFactory.getContext() found anchor class org.apache.druid.java.util.common.logger.Logger
    DEBUG StatusLogger Took 0.054826 seconds to load 255 plugins from jdk.internal.loader.ClassLoaders$AppClassLoader@659e0bfd
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger AsyncLogger.ThreadNameStrategy=UNCACHED (user specified null, default is UNCACHED)
    TRACE StatusLogger Using default SystemClock for timestamps.
    DEBUG StatusLogger org.apache.logging.log4j.core.util.SystemClock supports precise timestamps.
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger PluginManager 'Converter' found 48 plugins
    DEBUG StatusLogger Starting OutputStreamManager SYSTEM_OUT.false.false-1
    DEBUG StatusLogger Starting LoggerContext[name=659e0bfd, org.apache.logging.log4j.core.LoggerContext@3bd82cf5]...
    DEBUG StatusLogger Reconfiguration started for context[name=659e0bfd] at URI null (org.apache.logging.log4j.core.LoggerContext@3bd82cf5) with optional ClassLoader: null
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger PluginManager 'ConfigurationFactory' found 6 plugins
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger Missing dependencies for Yaml support, ConfigurationFactory org.apache.logging.log4j.core.config.yaml.YamlConfigurationFactory is inactive
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger PluginManager 'Lookup' found 16 plugins
    DEBUG StatusLogger Using configurationFactory org.apache.logging.log4j.core.config.ConfigurationFactory$Factory@609cd4d8
    TRACE StatusLogger Trying to find [log4j2-test659e0bfd.properties] using context class loader jdk.internal.loader.ClassLoaders$AppClassLoader@659e0bfd.
    TRACE StatusLogger Trying to find [log4j2-test659e0bfd.properties] using jdk.internal.loader.ClassLoaders$AppClassLoader@659e0bfd class loader.
    TRACE StatusLogger Trying to find [log4j2-test659e0bfd.properties] using jdk.internal.loader.ClassLoaders$AppClassLoader@659e0bfd class loader.
    • Here
    logs
    are mainly
    DEBUG
    and
    TRACE
    log. If some misconfigurations happen, there is no
    ERROR
    log. • However, when I
    exec
    into the druid's pod and try to print log from the file, I get different log and here I am able to find
    ERROR
    logs as well.
    Copy code
    $ cd log
    $ export sys=druid.node.type=value
    $ cp '${sys:druid.node.type}.log' historical.log
    $ cat historical.log
    2024-10-21T09:18:45,902 INFO [main] org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 6.2.5.Final
    2024-10-21T09:18:46,448 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-avro-extensions], jars: failureaccess-1.0.1.jar, swagger-models-1.6.2.jar, javax.annotation-api-1.3.2.jar, lz4-java-1.8.0.jar, common-utils-5.5.12.jar, avro-1.11.1.jar, snakeyaml-1.33.jar, swagger-core-1.6.2.jar, hadoop-client-api-3.3.6.jar, schema-repo-avro-0.1.3.jar, jersey-client-1.19.4.jar, schema-repo-client-0.1.3.jar, error_prone_annotations-2.20.0.jar, schema-repo-api-0.1.3.jar, j2objc-annotations-1.3.jar, jsr305-2.0.1.jar, swagger-annotations-1.6.2.jar, commons-lang3-3.12.0.jar, snappy-java-1.1.10.4.jar, checker-qual-3.12.0.jar, kafka-clients-5.5.12-ccs.jar, common-config-5.5.12.jar, velocity-engine-core-2.3.jar, avro-ipc-jetty-1.11.1.jar, guava-31.1-jre.jar, zstd-jni-1.5.2-3.jar, schema-repo-common-0.1.3.jar, druid-avro-extensions-28.0.1.jar, avro-mapred-1.11.1.jar, slf4j-api-1.7.36.jar, avro-ipc-1.11.1.jar, gson-2.3.1.jar, jackson-dataformat-yaml-2.12.7.jar, kafka-schema-registry-client-5.5.12.jar, xz-1.9.jar
    2024-10-21T09:18:46,465 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-kafka-indexing-service], jars: lz4-java-1.8.0.jar, snappy-java-1.1.10.4.jar, zstd-jni-1.5.2-3.jar, kafka-clients-3.5.1.jar, druid-kafka-indexing-service-28.0.1.jar
    2024-10-21T09:18:46,468 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-datasketches], jars: druid-datasketches-28.0.1.jar, commons-math3-3.6.1.jar
    2024-10-21T09:18:46,469 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-multi-stage-query], jars: druid-multi-stage-query-28.0.1.jar
    2024-10-21T09:18:46,470 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-basic-security], jars: druid-basic-security-28.0.1.jar
    2024-10-21T09:18:46,471 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [mysql-metadata-storage], jars: mysql-connector-java-5.1.49.jar, mysql-metadata-storage-28.0.1.jar
    2024-10-21T09:18:46,472 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-s3-extensions], jars: jmespath-java-1.12.497.jar, jackson-dataformat-cbor-2.12.7.jar, aws-java-sdk-sts-1.12.497.jar, httpclient-4.5.13.jar, commons-logging-1.1.1.jar, druid-s3-extensions-28.0.1.jar, joda-time-2.12.5.jar, aws-java-sdk-core-1.12.497.jar, commons-codec-1.13.jar, httpcore-4.4.11.jar, ion-java-1.0.2.jar
    2024-10-21T09:18:46,665 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-avro-extensions], jars: failureaccess-1.0.1.jar, swagger-models-1.6.2.jar, javax.annotation-api-1.3.2.jar, lz4-java-1.8.0.jar, common-utils-5.5.12.jar, avro-1.11.1.jar, snakeyaml-1.33.jar, swagger-core-1.6.2.jar, hadoop-client-api-3.3.6.jar, schema-repo-avro-0.1.3.jar, jersey-client-1.19.4.jar, schema-repo-client-0.1.3.jar, error_prone_annotations-2.20.0.jar, schema-repo-api-0.1.3.jar, j2objc-annotations-1.3.jar, jsr305-2.0.1.jar, swagger-annotations-1.6.2.jar, commons-lang3-3.12.0.jar, snappy-java-1.1.10.4.jar, checker-qual-3.12.0.jar, kafka-clients-5.5.12-ccs.jar, common-config-5.5.12.jar, velocity-engine-core-2.3.jar, avro-ipc-jetty-1.11.1.jar, guava-31.1-jre.jar, zstd-jni-1.5.2-3.jar, schema-repo-common-0.1.3.jar, druid-avro-extensions-28.0.1.jar, avro-mapred-1.11.1.jar, slf4j-api-1.7.36.jar, avro-ipc-1.11.1.jar, gson-2.3.1.jar, jackson-dataformat-yaml-2.12.7.jar, kafka-schema-registry-client-5.5.12.jar, xz-1.9.jar
    2024-10-21T09:18:46,667 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-kafka-indexing-service], jars: lz4-java-1.8.0.jar, snappy-java-1.1.10.4.jar, zstd-jni-1.5.2-3.jar, kafka-clients-3.5.1.jar, druid-kafka-indexing-service-28.0.1.jar
    2024-10-21T09:18:46,669 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-datasketches], jars: druid-datasketches-28.0.1.jar, commons-math3-3.6.1.jar
    2024-10-21T09:18:46,676 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-multi-stage-query], jars: druid-multi-stage-query-28.0.1.jar
    2024-10-21T09:18:46,684 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-basic-security], jars: druid-basic-security-28.0.1.jar
    2024-10-21T09:18:46,686 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [mysql-metadata-storage], jars: mysql-connector-java-5.1.49.jar, mysql-metadata-storage-28.0.1.jar
    2024-10-21T09:18:46,687 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-s3-extensions], jars: jmespath-java-1.12.497.jar, jackson-dataformat-cbor-2.12.7.jar, aws-java-sdk-sts-1.12.497.jar, httpclient-4.5.13.jar, commons-logging-1.1.1.jar, druid-s3-extensions-28.0.1.jar, joda-time-2.12.5.jar, aws-java-sdk-core-1.12.497.jar, commons-codec-1.13.jar, httpcore-4.4.11.jar, ion-java-1.0.2.jar
    2024-10-21T09:18:48,075 INFO [main] org.apache.druid.guice.JsonConfigurator - Skipping druid.emitter.logging.logLevel property: one of it's prefixes is also used as a property key. Prefix: druid
    2024-10-21T09:18:48,117 INFO [main] org.apache.druid.server.emitter.EmitterModule - Using emitter [NoopEmitter{}] for metrics and alerts, with dimensions [{version=28.0.1}].
    2024-10-21T09:18:48,327 INFO [main] org.apache.druid.server.metrics.MetricsModule - Loaded 6 monitors: org.apache.druid.java.util.metrics.JvmMonitor, org.apache.druid.server.metrics.ServiceStatusMonitor, org.apache.druid.query.ExecutorServiceMonitor, org.apache.druid.sql.avatica.AvaticaMonitor, org.apache.druid.curator.DruidConnectionStateListener, org.apache.druid.server.initialization.jetty.JettyServerModule$JettyMonitor
    2024-10-21T09:18:48,370 INFO [main] org.apache.druid.cli.CliBroker - Starting up with processors [16], memory [259,588,096], maxMemory [1,037,959,168], directMemory [1,073,741,824]. Properties follow.
    2024-10-21T09:18:48,371 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authenticator.basic.authorizerName: basic
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authenticator.basic.credentialsValidator.type: metadata
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authenticator.basic.initialAdminPassword: <masked>
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authenticator.basic.initialInternalClientPassword: <masked>
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authenticator.basic.skipOnFailure: false
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authenticator.basic.type: basic
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authenticatorChain: ["basic"]
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authorizer.basic.type: basic
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.auth.authorizers: ["basic"]
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.broker.cache.populateCache: false
    2024-10-21T09:18:48,372 INFO [main] org.apache.druid.cli.CliBroker - * druid.broker.cache.useCache: false
    2024-10-21T09:18:48,373 INFO [main] org.apache.druid.cli.CliBroker - * druid.broker.http.maxQueuedBytes: 10MiB
    2024-10-21T09:18:48,373 INFO [main] org.apache.druid.cli.CliBroker - * druid.broker.http.numConnections: 5
    2024-10-21T09:18:48,373 INFO [main] org.apache.druid.cli.CliBroker - * druid.emitter: noop
    2024-10-21T09:18:48,373 INFO [main] org.apache.druid.cli.CliBroker - * druid.emitter.logging.logLevel: info
    2024-10-21T09:18:48,373 INFO [main] org.apache.druid.cli.CliBroker - * druid.escalator.authorizerName: basic
    2024-10-21T09:18:48,373 INFO [main] org.apache.druid.cli.CliBroker - * druid.escalator.internalClientPassword: <masked>
    2024-10-21T09:18:48,373 INFO [main] org.apache.druid.cli.CliBroker - * druid.escalator.internalClientUsername: druid_system
    2024-10-21T09:18:48,374 INFO [main] org.apache.druid.cli.CliBroker - * druid.escalator.type: basic
    2024-10-21T09:18:48,374 INFO [main] org.apache.druid.cli.CliBroker - * druid.expressions.useStrictBooleans: true
    2024-10-21T09:18:48,374 INFO [main] org.apache.druid.cli.CliBroker - * druid.extensions.loadList: ["druid-avro-extensions", "druid-kafka-indexing-service", "druid-kafka-indexing-service", "druid-datasketches", "druid-multi-stage-query", "druid-basic-security", "mysql-metadata-storage", "druid-s3-extensions"]
    2024-10-21T09:18:48,374 INFO [main] org.apache.druid.cli.CliBroker - * druid.global.http.eagerInitialization: false
    • Is there any way I can get this log from pod's log with command
    kubectl logs -n demo druid-with-config-brokers-0
    ?
    s
    • 2
    • 2
  • l

    Luke Foskey

    10/24/2024, 8:21 PM
    Hi all, We are facing issues with some of our kubernetes deployed nodes (specifically They seem to have issues every few days where the memory lockups in GC then we get a OutOfMemory due to heap space. The pods are configured with -Xmx24G, it appears we might be seeing this bug from the JDK https://bugs.openjdk.org/browse/JDK-8192647 , the problem is that generally this ends up occurring across all the historicals and causes a mess due to the whole cluster having to come back online at the same time. Preferably I'd like to see the nodes lock up but clear out when the heavy queries time out then have them restart.
    Copy code
    [185705.991s][warning][gc,alloc] processing_6321770d-3243-4c19-b355-887bd29a33b4: Retried waiting for GCLocker too often allocating 256 words
    [185706.067s][warning][gc,alloc] processing_994b3e7f-b77e-4fae-9c61-c736dd7a10f6: Retried waiting for GCLocker too often allocating 256 words
    [185706.067s][warning][gc,alloc] processing-47: Retried waiting for GCLocker too often allocating 256 words
    [185706.407s][warning][gc,alloc] qtp1954745715-791[groupBy_[{datasource}]_6321770d-3243-4c19-b355-887bd29a33b4]: Retried waiting for GCLocker too often allocating 256 words
    [185706.407s][warning][gc,alloc] processing_7856790f-e7fc-4e32-8c63-202badebd7e0: Retried waiting for GCLocker too often allocating 256 words
    [185706.407s][warning][gc,alloc] processing_6321770d-3243-4c19-b355-887bd29a33b4: Retried waiting for GCLocker too often allocating 256 words
    [185707.105s][warning][gc,alloc] processing_29727115-5133-4508-a059-0d7f401aac34: Retried waiting for GCLocker too often allocating 256 words
    [185707.105s][warning][gc,alloc] qtp1954745715-1905[groupBy_[{datasource}]_eacee5c8-7fdd-4979-8eec-bc59c91b0484]: Retried waiting for GCLocker too often allocating 256 words
    [185707.105s][warning][gc,alloc] processing_29727115-5133-4508-a059-0d7f401aac34: Retried waiting for GCLocker too often allocating 256 words
    [185720.674s][warning][gc,alloc] processing_7856790f-e7fc-4e32-8c63-202badebd7e0: Retried waiting for GCLocker too often allocating 51813 words
    [185720.674s][warning][gc,alloc] processing_994b3e7f-b77e-4fae-9c61-c736dd7a10f6: Retried waiting for GCLocker too often allocating 514 words
    [185720.674s][warning][gc,alloc] processing_29727115-5133-4508-a059-0d7f401aac34: Retried waiting for GCLocker too often allocating 36748 words
    Terminating due to java.lang.OutOfMemoryError: Java heap space
    It seems other people have gotten around the error by increasing GCLockerRetryAllocationCount but I'm not sure if this will simply mask the errors. https://tech.clevertap.com/demystifying-g1-gc-gclocker-jni-critical-and-fake-oom-exceptions/
    k
    • 2
    • 3
  • j

    Julian Reyes

    10/30/2024, 2:15 PM
    Cross-posting here deployment done in EKS so wonder if someone else has done something similar https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1729603680963549
  • a

    Asif A

    11/07/2024, 7:40 AM
    Hi all, We have been deploying the druid mmless in AWS EKS cluster. We tried to increase some JVM configurations for peon, but the peon pods are not taking that configuration, eventhough the configurations are mounted in the peon pods. We are following the doc: https://druid.apache.org/docs/latest/development/extensions-contrib/k8s-jobs/. We tried changing the "druid-tiny-cluster-peons-config" configmap for updating the "jvm.config". The runtime.properties are getting reflected to peon process, but the jvm properties are not getting taken by peon. Help us to resolve this issue.
  • s

    Sharmin Choksey

    11/12/2024, 1:34 AM
    Hello, trying to move to zk-less deployment on k8s. Everything runs fine with ingestion and query until it is time for task rollover which is when it fails with this error,
    Copy code
    2024-11-11T23:02:18,051 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Stopping NodeRoleWatcher for [OVERLORD]...
    2024-11-11T23:02:18,052 ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcheroverlord] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Error while watching node type [OVERLORD]
    java.lang.RuntimeException: IO Exception during hasNext method.
            at io.kubernetes.client.util.Watch.hasNext(Watch.java:183) ~[?:?]
            at org.apache.druid.k8s.discovery.DefaultK8sApiClient$2.hasNext(DefaultK8sApiClient.java:132) ~[?:?]
            at org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher.keepWatching(K8sDruidNodeDiscoveryProvider.java:269) ~[?:?]
            at org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher.watch(K8sDruidNodeDiscoveryProvider.java:238) ~[?:?]
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
            at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
            at java.lang.Thread.run(Thread.java:840) ~[?:?]
    Caused by: java.io.InterruptedIOException
    We use the druid-operator and these are the configs,
    Copy code
    druid.zk.service.enabled=false
        druid.serverview.type=http
        druid.coordinator.loadqueuepeon.type=http
        druid.indexer.runner.type=httpRemote
        druid.discovery.type=k8s
        druid.discovery.k8s.clusterIdentifier=cc-apache-druid
    It is NOT a MM-less deployment. Is there any mis-configuration to look for?
    c
    • 2
    • 2
  • n

    Noor

    11/20/2024, 11:00 AM
    Hello everyone I have included org.apache.druid.server.metrics.QueryCountStatsMonitor for historical runtime properties in kubernetes cluster. But I could not able to see any metrics related to query. These are the config for runtime properties of historical: druid.port=8083 druid.service=druid/historical druid.plaintextPort=8083 # HTTP server threads druid.server.http.numThreads=60 #auth disable for start up probes druid.auth.allowUnauthenticatedHttpOptions=true druid.auth.unsecuredPaths=["/status", "/druid/broker/v1/loadstatus", "/druid/historical/v1/loadstatus"] # Processing threads and buffers druid.processing.buffer.sizeBytes=100MiB druid.processing.numMergeBuffers=4 druid.processing.numThreads=15 #druid.processing.tmpDir=var/druid/processing druid.processing.tmpDir=/druid/data/processing druid.processing.formatString=historical-hot-tier # Segment storage druid.segmentCache.locations=[{"path":"/druid/historical/hot/segment-cache","maxSize":24000000000}] druid.server.maxSize=24000000000 # Query cache druid.historical.cache.useCache=true druid.historical.cache.populateCache=true druid.cache.type=caffeine druid.cache.sizeInBytes=256MiB druid.query.groupBy.maxOnDiskStorage=10G # Tier information druid.server.tier=hot # GroupByV2 config druid.query.groupBy.numParallelCombineThreads=2 # Segment Loading druid.segmentCache.numLoadingThreads=8 druid.segmentCache.numThreadsToLoadSegmentsIntoPageCacheOnDownload=2 druid.segmentCache.numThreadsToLoadSegmentsIntoPageCacheOnBootstrap=8 # Metrics Monitors druid.monitoring.monitors=["org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.java.util.metrics.SysMonitor", "org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]
    • 1
    • 1
  • c

    Corwin Lester

    11/26/2024, 3:14 PM
    Is anyone running Druid via the https://github.com/datainfrahq/druid-operator on Openshift?
    ✅ 1
    l
    • 2
    • 1
  • c

    Corwin Lester

    11/26/2024, 3:55 PM
    Having issues getting the tmp directory created with the 'druid' user, but it might be on our end.
  • r

    rishikesh

    12/04/2024, 10:46 AM
    Hello, I upgraded druid31 and login druid and found some error like " *NO management poxy mode*" and try to used below configuration for this to resolved but not resolved Can you please help me to fixed this... The console is running in restricted mode. The management proxy is disabled, the console will operate with limited functionality. For more info refer to the web console documentation. It is possible that the console is experiencing an issue with the capability detection. You can enable the unrestricted console, but certain features might not work if the underlying APIs are not available.
  • k

    Krishna

    12/06/2024, 7:12 PM
    Hi, when we try to enable orc on MM EKS we are seeing this error . can some one please help https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1732688653392079
  • s

    schmichri

    12/16/2024, 2:33 PM
    Hey all, we've published our Druid Cluster Configuration for our kubernetes native druid installation (k8s jobs, no zookeeper, HPA for historical nodes, etc) It's based on the druid-operator and flux-cd. Secrets managed by SOPS, all endpoints TLS encrypted https://github.com/iunera/druid-cluster-config incl. examples for authorization and authentication and the pac4j Setup vor Azure AD / EntryID. you might be interested in it.
    🙌 2
    a
    • 2
    • 2
  • s

    schmichri

    12/16/2024, 2:35 PM
    ah this was part of an open source project funded by the german ministry of public transportation (Fahrbar20)
  • m

    Mahesha Subrahamanya

    01/16/2025, 8:45 PM
    Hello Team, We are constructing below druid index parallel API ingestion in Java code with below payload some other property however in Druid 28, below API works and loaded into Druid. When we run the same API and s3 data file in Druid 31 which fails with below error. One more thing noticed that "timestampColumn" was causing this failure. I changed it to "timeStampColumn" then it worked. Is there any column name constraint in Druid v31.
    Copy code
    ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.stats.TaskRealtimeMetricsMonitor - [499] unparseable events discarded. 
    request failed with 500
    --data '{ "segmentGranularity": "ALL", "inputFormatType": "PARQUET", "inputType": "S3", "inputPaths": [ "s3://druid_ingest_testing/landing-zone1/xyz/12345/10mil/load_1/" ], "timestampColumn": "__time" }'
    Copy code
    webClient.post()
        .uri("/indexer/v1/task")
        .contentType(MediaType.APPLICATION_JSON)
        .bodyValue(ingestionTask)
        .retrieve()
        .bodyToMono(ingestionResultReference)
  • n

    Noor

    01/22/2025, 10:51 AM
    HI team I am getting this error in our coordinator logs. Can someone please help here
    2025-01-22T10:43:51,885 INFO [Thread-48] org.apache.druid.security.basic.authorization.db.updater.CoordinatorBasicAuthorizerMetadataStorageUpdater - CoordinatorBasicAuthorizerMetadataStorageUpdater is stopped.
    2025-01-22T10:43:51,885 INFO [Thread-48] org.apache.druid.security.basic.authentication.db.updater.CoordinatorBasicAuthenticatorMetadataStorageUpdater - CoordinatorBasicAuthenticatorMetadataStorageUpdater is stopping.
    2025-01-22T10:43:51,885 INFO [Thread-48] org.apache.druid.security.basic.authentication.db.updater.CoordinatorBasicAuthenticatorMetadataStorageUpdater - CoordinatorBasicAuthenticatorMetadataStorageUpdater is stopped.
    2025-01-22T10:43:51,886 INFO [CoordinatorBasicAuthenticatorCacheNotifier-notifierThread-0] org.apache.druid.security.basic.CommonCacheNotifier - CoordinatorBasicAuthenticatorCacheNotifier: Interrupted while handling updates for cachedUserMaps. (java.lang.InterruptedException)
    2025-01-22T10:43:51,886 INFO [CoordinatorBasicAuthorizerCacheNotifier-notifierThread-0] org.apache.druid.security.basic.CommonCacheNotifier - CoordinatorBasicAuthorizerCacheNotifier: Interrupted while handling updates for cachedUserMaps. (java.lang.InterruptedException)
    2025-01-22T10:43:51,886 INFO [CoordinatorBasicAuthorizerCacheNotifier-notifierThread-0] org.apache.druid.security.basic.CommonCacheNotifier - CoordinatorBasicAuthorizerCacheNotifier: Interrupted while handling updates for cachedUserMaps. (java.lang.InterruptedException)
    2025-01-22T10:43:51,902 ERROR [HttpClient-Netty-Worker-5] com.google.common.util.concurrent.ExecutionList - RuntimeException while executing runnable com.google.common.util.concurrent.Futures$4@27fc53a9 with executor java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@64ad1398[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@1c9b9c89[Wrapped task = com.google.common.util.concurrent.Futures$4@27fc53a9]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705) ~[?:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.1.jar:?]
    2025-01-22T10:43:51,907 INFO [HttpClient-Netty-Worker-5] org.apache.druid.java.util.http.client.pool.ResourcePool - giveBack called after being closed. key[<https://240.38.33.85:8283>]
    2025-01-22T10:43:51,913 ERROR [HttpClient-Netty-Worker-6] com.google.common.util.concurrent.ExecutionList - RuntimeException while executing runnable com.google.common.util.concurrent.Futures$4@3fd43c9 with executor java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@409efbf0[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@78d28b8c[Wrapped task = com.google.common.util.concurrent.Futures$4@3fd43c9]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705) ~[?:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.1.jar:?
    2025-01-22T10:43:51,913 INFO [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.pool.ResourcePool - giveBack called after being closed. key[<https://XX:8283>]
    2025-01-22T10:43:51,914 ERROR [HttpClient-Netty-Worker-7] com.google.common.util.concurrent.ExecutionList - RuntimeException while executing runnable com.google.common.util.concurrent.Futures$4@473573be with executor java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2e2ef737[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@1f7becba[Wrapped task = com.google.common.util.concurrent.Futures$4@473573be]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705) ~[?:?]
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.1.jar:?]
    2025-01-22T10:43:51,915 INFO [HttpClient-Netty-Worker-7] org.apache.druid.java.util.http.client.pool.ResourcePool - giveBack called after being closed. key[<https://XXX:8283>]
    2025-01-22T10:43:51,915 ERROR [HttpClient-Netty-Worker-8] com.google.common.util.concurrent.ExecutionList - RuntimeException while executing runnable com.google.common.util.concurrent.Futures$4@34e15537 with executor java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@5c456b81[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@d1cad3e[Wrapped task = com.google.common.util.concurrent.Futures$4@34e15537]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@760f7d57[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 17]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?]
  • v

    VP

    01/28/2025, 1:50 PM
    Hi we recently migrated our dev servers from EC2 to EKS. Started seeing an issue where Datasources shows one segment to load but Services shows empty load/drop queues. Historical and Coordinator logs show segment load exception but unclear what is causing the exception. No errors on Zookeeper or S3. Any ideas?
    k
    • 2
    • 8
  • n

    Noor

    03/21/2025, 6:46 AM
    Hi team, We are getting all the metrics of druid exporter for our gke cluster. But not sure why we are not getting kafka related metrics , for example we need to monitor often for ingest/kafka/maxLag. Can someone guide here. For VM cluster we are getting but not sure why we are not getting for kafka related metrics in gke side
    c
    • 2
    • 3
  • g

    Goran Zaric

    04/07/2025, 1:17 PM
    Hello team 👋 Running Druid
    v30.0.0
    and we'd like to go for
    v32.0.1
    . Noticed couple of incompatible changes but is it safe to jump over 2 major versions? Current druid operator version is
    v1.2.3
    . How you would recommend approaching the upgrade?
    g
    s
    • 3
    • 7
  • s

    Sergei

    04/28/2025, 6:05 PM
    Hi team, I am trying to deploy Imply to AWS EKS cluster managed by Karpenter. Is there any good getting started document ?
    g
    • 2
    • 1
  • l

    Luke Foskey

    05/01/2025, 3:23 AM
    Hi team, been sometime but is there any plans to update https://operatorhub.io/operator/druid-operator to align with the helm based operator, we'd prefer to maintain our current deployment workflow via operatorhub
  • j

    Jb Graindorge

    05/01/2025, 2:44 PM
    Hi, I am using K8S MM-less feature.
    druid.processing.intermediaryData.storage.type=deepstore
    has been setup but when an ingestion start I get this error (truncated)
    Copy code
    2025-05-01T14:41:46,595 INFO [main] org.apache.druid.cli.CliPeon - Task file not found, trying to pull task payload from deep storage                                                                              │
    │ 2025-05-01T14:41:46,827 ERROR [main] org.apache.druid.cli.CliPeon - Failed to start, 1 errors                                                                                                                      │
    │ 2025-05-01T14:41:46,828 ERROR [main] org.apache.druid.cli.CliPeon - java.lang.IllegalStateException: Optional.get() cannot be called on an absent value                                                            │
    │ java.lang.IllegalStateException: Optional.get() cannot be called on an absent value                                                                                                                                │
    │     at com.google.common.base.Absent.get(Absent.java:44) ~[guava-32.0.1-jre.jar:?]                                                                                                                                 │
    │     at org.apache.druid.cli.CliPeon$1.readTask(CliPeon.java:317) ~[druid-services-2025.01.0-iap.jar:2025.01.0-iap]                                                                                                 │
    │     at org.apache.druid.cli.CliPeon$1$$FastClassByGuice$$44261020.GUICE$TRAMPOLINE(<generated>) ~[?:2025.01.0-iap]                                                                                                 │
    │     at org.apache.druid.cli.CliPeon$1$$FastClassByGuice$$44261020.apply(<generated>) ~[?:2025.01.0-iap]                                                                                                            │
    │     at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:260) ~[guice-5.1.0.jar:?]                                                                                 │
    │     at com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:171) ~[guice-5.1.0.jar:?]                                                                                                         │
    │     at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.provision(InternalProviderInstanceBindingImpl.java:185) ~[guice-5.1.0.jar:?]                                                   │
    │     at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.get(InternalProviderInstanceBindingImpl.java:162) ~[guice-5.1.0.jar:?]                                                         │
    │     at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) ~[guice-5.1.0.jar:?]                                                                              │
    │     at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:169) ~[guice-5.1.0.jar:?]                                                                                                               │
    │     at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45) ~[guice-5.1.0.jar:?]                                                                              │
    │     at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:40) ~[guice-5.1.0.jar:?]                                                                                             │
    │     at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:60) ~[guice-5.1.0.jar:?]                                                                                             │
    │     at com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:171) ~[guice-5.1.0.jar:?]                                                                                                         │
    │     at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.
    Not sure what is wrong and why the task payload is not written in S3 (everything is ok in IAM)
    g
    • 2
    • 5
  • j

    Jb Graindorge

    05/06/2025, 9:36 AM
    Hello team, We are using Druid on EKS (AWS) and peons are giving these errors :
    Copy code
    2025-05-06T09:16:45,015 ERROR [main] org.apache.druid.java.util.metrics.cgroups.Cpu - Unable to fetch cpu snapshot
    2025-05-06T09:16:45,020 WARN [main] org.apache.druid.java.util.metrics.CgroupUtil - Unable to fetch cpu.shares
    2025-05-06T09:16:45,021 WARN [main] org.apache.druid.java.util.metrics.CgroupUtil - Unable to fetch cpu.cfs_quota_us
    2025-05-06T09:16:45,021 WARN [main] org.apache.druid.java.util.metrics.CgroupUtil - Unable to fetch cpu.cfs_period_us
    2025-05-06T09:16:45,023 ERROR [main] org.apache.druid.java.util.metrics.cgroups.CpuSet - Unable to read cpuset.cpus
    2025-05-06T09:16:45,023 ERROR [main] org.apache.druid.java.util.metrics.cgroups.CpuSet - Unable to read cpuset.effective_cpus
    2025-05-06T09:16:45,024 ERROR [main] org.apache.druid.java.util.metrics.cgroups.CpuSet - Unable to read cpuset.mems
    2025-05-06T09:16:45,024 ERROR [main] org.apache.druid.java.util.metrics.cgroups.CpuSet - Unable to read cpuset.effective_mems
    I tried to add
    Copy code
    -Ddruid.metrics.cgroup.cpuPath=/sys/fs/cgroup/cpu.max
    -Ddruid.metrics.cgroup.cpuSharesPath=/sys/fs/cgroup/cpu.weight
    -Ddruid.metrics.cgroup.cpuQuotaPath=/sys/fs/cgroup/cpu.max
    -Ddruid.metrics.cgroup.cpuPeriodPath=/sys/fs/cgroup/cpu.max
    -Ddruid.metrics.cgroup.cpusetPath=/sys/fs/cgroup/cpuset.cpus
    -Ddruid.metrics.cgroup.effectiveCpuSetPath=/sys/fs/cgroup/cpuset.cpus.effective
    -Ddruid.metrics.cgroup.memPath=/sys/fs/cgroup/cpuset.mems
    -Ddruid.metrics.cgroup.effectiveMemPath=/sys/fs/cgroup/cpuset.mems.effective
    to the peon JAVA_OPTS but errors are still here... any idea please ? It looks like EKS is running cgroup v2, is it compatible with Druid ?
    g
    • 2
    • 3
  • j

    Jb Graindorge

    05/06/2025, 11:22 AM
    actually all the processes are giving these errors, even with the
    Copy code
    druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.OshiSysMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor", "org.apache.druid.java.util.metrics.CgroupV2CpuMonitor","org.apache.druid.java.util.metrics.CgroupV2DiskMonitor","org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor"]
  • j

    Jb Graindorge

    05/07/2025, 7:31 AM
    we were able to hide the errors/warnings within log4j2 conf
    Copy code
    <Logger name="org.apache.druid.java.util.metrics" level="off" additivity="false">
                    <AppenderRef ref="Console"/>
                </Logger>
    We still don't know why we got these errors, we will probably raise an issue with Imply support and/or on the github repo