Mahesha Subrahamanya
11/21/2023, 7:14 PMjava.lang.RuntimeException: org.apache.druid.java.util.common.ISE: Unable to grant lock to inactive Task [query-361ab928-5ed4-4432-9786-733ad19fda20]
at org.apache.druid.indexing.common.actions.LocalTaskActionClient.performAction(LocalTaskActionClient.java:98) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.common.actions.LocalTaskActionClient.submit(LocalTaskActionClient.java:80) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.http.OverlordResource$4.apply(OverlordResource.java:615) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.http.OverlordResource$4.apply(OverlordResource.java:604) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.http.OverlordResource.asLeaderWith(OverlordResource.java:1099) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.http.OverlordResource.doAction(OverlordResource.java:601) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at jdk.internal.reflect.GeneratedMethodAccessor124.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) ~[jersey-server-1.19.4.jar:1.19.4]
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) ~[jersey-servlet-1.19.4.jar:1.19.4]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) ~[jersey-servlet-1.19.4.jar:1.19.4]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) ~[jersey-servlet-1.19.4.jar:1.19.4]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]
at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286) ~[guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276) ~[guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181) ~[guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) ~[guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) ~[guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) ~[guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135) ~[guice-servlet-4.1.0.jar:?]
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.apache.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:73) ~[druid-server-26.0.0.jar:26.0.0]
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:84) ~[druid-server-26.0.0.jar:26.0.0]
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at
Rohan Sharma
11/22/2023, 5:43 PMSujith Kumar.S
11/24/2023, 10:48 AMGururaj K.P
11/29/2023, 7:17 AMVikram Jothik Mateti
12/01/2023, 2:12 PMhttps://rtfm.co.ua/wp-content/uploads/2022/09/druid_architecture_diagram.png▾
Mahesha Subrahamanya
12/06/2023, 1:10 AMmiddlemanager:
replicas: 5
numMergeBuffers: 2
bufferSizeBytes: 120MiB
numThreadsProcessing: 2
numThreadsHttp: 32
workerCapacity: 3
runnerJavaOpts:
xms: 512m
xmx: 4500m
MaxDirectMemorySize: 1g
cpuRequest: 3000m
memoryRequest: 18Gi
memoryLimit: 18Gi
ephemeralStorageLimit: "96Gi"
We did allocate 4.5 GB peon memory to accommodate two tables are in Fact table (~2.4 millions data, 130.26 MB) and another table (1.9 millions data, 567.78 MB). we thought 4.5 gb (peon memory) is more than enough however it's running into below error.
"errorMsg": "BroadcastTablesTooLarge: Size of broadcast tables in JOIN exceeds reserved memory limit (memory rese..."
Any recommendation or any insights in debugging this error. Appreciate your help.Alexandre Dantas
12/11/2023, 9:26 PMDiganta Mukherjee
12/15/2023, 7:47 AMMahesha Subrahamanya
12/16/2023, 7:06 AMBoazG
01/08/2024, 7:39 PMMd Noorshid
01/09/2024, 6:31 AM#Postgresql Integration
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=${env:STORAGE_CONNECTOR_URI}
druid.metadata.storage.connector.user=${env:STORAGE_CONNECTOR_USER}
druid.metadata.storage.connector.password=${env:STORAGE_CONNECTOR_PASSWORD}
druid.metadata.postgres.ssl.useSSL=true
druid.metadata.postgres.ssl.sslMode=verify-ca
druid.metadata.postgres.ssl.sslCert=/opt/druid/ssl/postgres-server-cert.pem
druid.metadata.postgres.ssl.sslKey=/opt/druid/ssl/postgres-server-key.pk8
druid.metadata.postgres.ssl.sslRootCert=/opt/druid/ssl/postgres-root-cert.pem
druid.metadata.storage.connector.createTables=false
#
# Storage tables
#
druid.metadata.storage.tables.base=druid
druid.metadata.storage.tables.datasource=dataSource
druid.metadata.storage.tables.pendingSegments=pendingSegments
druid.metadata.storage.tables.segments=segments_v1
druid.metadata.storage.tables.rules=rules
druid.metadata.storage.tables.config=config
druid.metadata.storage.tables.tasks=tasks
druid.metadata.storage.tables.taskLog=taskLog
druid.metadata.storage.tables.taskLock=taskLock
druid.metadata.storage.tables.supervisors=supervisors
druid.metadata.storage.tables.audit=audit
Could some one guide how to use the old tables for the metadata because we cannot afford to loss our data as we need to do it for production alsoDomingo Rodríguez López
01/18/2024, 11:13 AMRohan Sharma
01/19/2024, 1:08 PMException in thread "main" java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors:
1) No implementation for org.apache.druid.server.metrics.TaskCountStatsProvider was bound.
while locating org.apache.druid.server.metrics.TaskCountStatsProvider
for the 1st parameter of org.apache.druid.server.metrics.TaskCountStatsMonitor.<init>(TaskCountStatsMonitor.java:40)
while locating org.apache.druid.server.metrics.TaskCountStatsMonitor
at org.apache.druid.server.metrics.MetricsModule.getMonitorScheduler(MetricsModule.java:113) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.server.metrics.MetricsModule)
at org.apache.druid.server.metrics.MetricsModule.getMonitorScheduler(MetricsModule.java:113) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.server.metrics.MetricsModule)
while locating org.apache.druid.java.util.metrics.MonitorScheduler
at org.apache.druid.server.metrics.MetricsModule.configure(MetricsModule.java:98) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.server.metrics.MetricsModule)
while locating org.apache.druid.java.util.metrics.MonitorScheduler annotated with @com.google.inject.name.Named(value=ForTheEagerness)
1 error
at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:88)
at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:62)
at org.apache.druid.cli.Main.main(Main.java:112)
Is anyone aware of the above error and what could be the possible reason for it?
This is the config:
druid.monitoring.monitors=["org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.server.metrics.EventReceiverFirehoseMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.server.metrics.TaskCountStatsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]
druid.emitter=http
druid.emitter.http.recipientBaseUrl=<http://druid-prom-exporter-prometheus-druid-exporter:8080/druid>
Julian Reyes
01/24/2024, 4:14 PMcustomTemplateAdapter
.
So far I have in overall ConfigMap this:
druid_indexer_task_encapsulatedTask: "true"
druid_indexer_runner_namespace: "{{ .Values.namespace }}"
druid_indexer_runner_capacity: "36" //dev is 36 and prod 144?
druid_indexer_runner_type: "k8s"
druid_indexer_runner_k8s_adapter_type: "customTemplateAdapter"
druid_indexer_runner_k8s_podTemplate_base: "/path/to/basePodSpec.yaml"
druid_indexer_runner_k8s_podTemplate_index_kinesis: "/path/to/taskSpecificPodSpec.yaml"
I do not know if we still have to set druid_indexer_runner_k8s_podTemplate_base
tho.
For Coordinator we have 3 replicas in dev and prod.
For the taskSpecificPodSpec.yaml
, do I have to map what we have in MM StatefulSet? or do we have to modify Coordinator to have more replicas instead? not understand that part still.
This is our MM StatefulSet
containers:
- args:
- middleManager
env:
- name: DRUID_MAXDIRECTMEMORYSIZE
value: "10g"
- name: DRUID_XMS
value: "5g"
- name: DRUID_XMX
value: "5g"
- name: druid_indexer_fork_property_druid_processing_buffer_sizeBytes
value: "300000000"
- name: druid_indexer_fork_property_druid_processing_numMergeBuffers
value: "2"
- name: druid_indexer_fork_property_druid_processing_numThreads
value: "2"
- name: druid_indexer_fork_property_druid_server_http_numThreads
value: "50"
- name: druid_indexer_runner_javaOpts
value: "-server -Xmx5g -XX:+IgnoreUnrecognizedVMOptions -XX:MaxDirectMemorySize=10g -Duser.timezone=UTC -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/tmp/druid-peon.hprof -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"
- name: druid_indexer_runner_javaOptsArray
value: '["-server", "-Xmx5g", "-XX:+IgnoreUnrecognizedVMOptions", "-XX:MaxDirectMemorySize=10g", "-Duser.timezone=UTC", "-XX:+PrintGC", "-XX:+PrintGCDateStamps", "-XX:+ExitOnOutOfMemoryError", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/mnt/tmp/druid-peon.hprof", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]'
- name: druid_indexer_task_baseDir
value: "/opt/druid/var/tmp"
- name: druid_indexer_task_gracefulShutdownTimeout
value: "PT120S"
- name: druid_indexer_task_restoreTasksOnRestart
value: "true"
- name: druid_node_type
value: "middleManager"
- name: druid_service
value: "druid/middleManager"
- name: druid_worker_capacity
value: "12"
- name: druid_emitter_statsd_dimensionMapPath
value: "/opt/druid/conf/druid/metrics-dimensions.json"
- name: druid_emitter_prometheus_dimensionMapPath
value: "/opt/druid/conf/druid/metrics-dimensions.json"
- name: druid_prometheus_emitter
value: "true"
- name: druid_prometheus_emitter_port
value: "9091"
- name: druid_emitter_prometheus_strategy
value: "exporter"
- name: druid_emitter_prometheus_pushGatewayAddress
value: "{{ .Values.prometheusService }}.{{ .Values.prometheusNamespace }}.svc:{{ .Values.prometheusServicePort }}"
- name: druid_indexer_fork_property_druid_emitter_prometheus_strategy
value: "pushgateway"
- name: druid_indexer_fork_property_druid_emitter_prometheus_pushGatewayAddress
value: "{{ .Values.prometheusService }}.{{ .Values.prometheusNamespace }}.svc:{{ .Values.prometheusServicePort }}"
- name: druid_monitoring_monitors
value: '["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.server.metrics.WorkerTaskCountStatsMonitor"]'
- name: druid_emitter_statsd_hostname
valueFrom:
fieldRef:
fieldPath: status.hostIP
envFrom:
- configMapRef:
name: druid
- secretRef:
name: druid-master-connection
image: '{{ .Values.druidImage }}'
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- cp /opt/druid/extensions/postgresql-metadata-storage/postgresql-42.6.0.jar /opt/druid/lib/ && cd -
livenessProbe:
Also, that brings the question on whether this will reduce cost somehow by running less EC2 instances needed for the Druid cluster in EKSLuiz Augusto
01/25/2024, 11:50 AM# druid.server.http.numThreads > druid.broker.http.numConnections on the same Broker.
druid_server_http_numThreads: "20"
druid_broker_http_numConnections: "10"
What should I add to ensure the broker has resources to process probes calls?jagr
01/31/2024, 12:37 PMLuiz Augusto
02/01/2024, 4:05 PM2024-02-01T14:00:26,814 WARN [MonitorScheduler-0] oshi.hardware.platform.linux.LinuxHWDiskStore - Disk Store information requires libudev, which is not present.
Adheip Singh
02/12/2024, 6:06 PMAhemad Ali Shaik
02/14/2024, 4:35 PMJb Graindorge
02/19/2024, 11:03 AMdruid.client.https.validateHostnames=false
Rumesh Madhusanka
02/20/2024, 2:25 PM_java.lang.RuntimeException_: _java.lang.IllegalArgumentException_: The argument must not be null. Argument name: containerName.
But I have set the container name according to the extension docs. (I see this in my task logs:)
2024-02-20T14:03:15,082 INFO [main] org.apache.druid.cli.CliPeon - * druid.azure.account: <masked>
2024-02-20T14:03:15,082 INFO [main] org.apache.druid.cli.CliPeon - * druid.azure.key: <masked>
Also the azure extension is loaded:
2024-02-20T14:03:08,985 INFO [main] org.apache.druid.guice.ExtensionsLoader - Loading extension [druid-azure-extensions], jars: druid-azure-extensions-28.0.1.jar, azure-keyvault-core-1.0.0.jar, azure-storage-8.6.0.jar
I attaching the full task logs here too.
Could you please tell me what I have incorrectly configured here?Adith Reddy
02/26/2024, 7:55 AM莫山
02/29/2024, 10:02 AMMahesha Subrahamanya
03/04/2024, 7:50 PMRohan Sharma
03/06/2024, 10:07 AMError in custom provider, java.lang.NoClassDefFoundError: io/prometheus/client/Collector
at org.apache.druid.emitter.prometheus.PrometheusEmitterModule.getEmitter(PrometheusEmitterModule.java:60) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.emitter.prometheus.PrometheusEmitterModule)
at org.apache.druid.emitter.prometheus.PrometheusEmitterModule.getEmitter(PrometheusEmitterModule.java:60) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.emitter.prometheus.PrometheusEmitterModule)
while locating org.apache.druid.java.util.emitter.core.Emitter annotated with @com.google.inject.name.Named("prometheus")
at org.apache.druid.server.emitter.EmitterModule$EmitterProvider.inject(EmitterModule.java:133)
Has someone experienced this issue in past?Sharmin Choksey
03/08/2024, 7:17 AMdruid_ingest_event_*
metrics? Assuming it is set on the MM, Overlord? I tried configuring those with org.apache.druid.server.metrics.EventReceiverFirehoseMonitor
but don't see any metric values despite generating traffic on the clusterYoungsol Koh
03/11/2024, 1:44 PMcommon.runtime.properties: |
#
# Zookeeper-less Druid Cluster
#
druid.zk.service.enabled=false
druid.discovery.type=k8s
druid.discovery.k8s.clusterIdentifier=dev-identifier
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http
druid.indexer.runner.type=httpRemote
After indexing from kafka about an hour, middle manager is lost. I could not find mm in service in web console. I found that the middle manager pod does not have label: druidDiscoveryAnnouncement-cluster-identifier
while all others have the label. Is this a bug? or did I miss something?Soman Ullah
03/14/2024, 7:25 PMNoor
03/19/2024, 4:57 PMGoran Zaric
03/25/2024, 11:25 AMk8s
I am still wondering around couple of things. The first is about whether it is possible to achieve “hybrid” mode to run task loads.
MM-less (for the context): https://druid.apache.org/docs/latest/development/extensions-contrib/k8s-jobs/
That mode would allow to eventually have e.g. fixed number of MMs running in the cluster to be able to accept any possible task, apparently if there is a slot available.
If no slots available then k8s
jobs would kick in to scale across new coming tasks.
So far, what I managed to do is to use task type overrides in order to direct a specific type of task to be scheduled either to MM
or to Job
. However there is no such flexibility as to let any type of task be scheduled to any possible free spot to run the process.
druid.indexer.runner.type=k8sAndWorker
druid.indexer.runner.k8sAndWorker.runnerStrategy.type=taskType
druid.indexer.runner.k8sAndWorker.runnerStrategy.taskType.default=k8s
druid.indexer.runner.k8sAndWorker.runnerStrategy.taskType.overrides={"index_parallel": "worker","index_kafka": "k8s","single_phase_sub_task": "worker","compact": "k8s"}
Do you have any hints how “hybrid” mode could be achieved?
On the other hand (with no hybrid mode on mind, just MM-less) I am not 100% sure how the migration from MM
to k8s
Jobs would be conducted... let’s say in production.
Although all incoming tasks will be scheduled to k8s
Jobs, how about long running tasks already sitting on MMs? Should we gradually start scaling MM
deployment one by one down, and let jobs taking over by ,or take a different path?