Hi Team I am running on EC2 and trying out Pinot monitoring Apache Pinot #troubleshooting

Hi Team, I am running on EC2 and trying out Pinot ...

Lee Wei Hern Jason

10/28/2022, 10:04 AM

Hi Team, I am running on EC2 and trying out Pinot monitoring. I installed prometheus and prometheus node_exporter. configured my javaopts (with the jar and pinot.yml)

Copy code

-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms2G -Xmx2G -Dlog4j2.configurationFile=conf/log4j2.xml -Dpinot.admin.system.exit=true -Dplugins.dir=/opt/pinot/plugins

However when i view the JMX metrics, i can’t see any of the metrics stated here. Did anyone encountered this issue before ?

Tim Santos

10/28/2022, 5:03 PM

Hi @Lee Wei Hern Jason, how are you viewing the metrics? I usually check: 1. Check the individual Pinot pods to see if the jmx metrics are actually being exposed on the 8008 port. E.g.

kubectl port-forward your-broker-pod-name 8008:8008

2. Check your prometheus instance to see if it is scraping the metrics Note that metrics won't be visible until the metric values are emitted. So you may need to make some queries/ingest data into Pinot before you see it.

Lee Wei Hern Jason

10/28/2022, 5:22 PM

I have already exposed the ports.. This is the output i get from localhost:8008/metrics. I am running on ec2, not k8. I think by right i should see all metrics shown in the document in localhost:8008/metrics right ? But i can’t see them.

pinot_controller_validateion_TotalDocumentCount_Value

is the only related pinot metrics (which is not even shown in the docs). Ultimately i am trying to push them to Datadog

Copy code

# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.4316156E9
jvm_memory_bytes_used{area="nonheap",} 1.49580312E8
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 2.147483648E9
jvm_memory_bytes_committed{area="nonheap",} 1.58560256E8
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 2.147483648E9
jvm_memory_bytes_max{area="nonheap",} -1.0
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{area="heap",} 2.147483648E9
jvm_memory_bytes_init{area="nonheap",} 7667712.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="CodeHeap 'non-nmethods'",} 1566080.0
jvm_memory_pool_bytes_used{pool="Metaspace",} 8.6832376E7
jvm_memory_pool_bytes_used{pool="CodeHeap 'profiled nmethods'",} 3.6845184E7
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 9732512.0
jvm_memory_pool_bytes_used{pool="G1 Eden Space",} 1.2845056E9
jvm_memory_pool_bytes_used{pool="G1 Old Gen",} 1.45012848E8
jvm_memory_pool_bytes_used{pool="G1 Survivor Space",} 2097152.0
jvm_memory_pool_bytes_used{pool="CodeHeap 'non-profiled nmethods'",} 1.460416E7
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="CodeHeap 'non-nmethods'",} 2555904.0
jvm_memory_pool_bytes_committed{pool="Metaspace",} 9.033728E7
jvm_memory_pool_bytes_committed{pool="CodeHeap 'profiled nmethods'",} 3.9583744E7
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.1010048E7
jvm_memory_pool_bytes_committed{pool="G1 Eden Space",} 1.350565888E9
jvm_memory_pool_bytes_committed{pool="G1 Old Gen",} 7.94820608E8
jvm_memory_pool_bytes_committed{pool="G1 Survivor Space",} 2097152.0
jvm_memory_pool_bytes_committed{pool="CodeHeap 'non-profiled nmethods'",} 1.507328E7
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="CodeHeap 'non-nmethods'",} 5828608.0
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
jvm_memory_pool_bytes_max{pool="CodeHeap 'profiled nmethods'",} 1.22912768E8
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
jvm_memory_pool_bytes_max{pool="G1 Eden Space",} -1.0
jvm_memory_pool_bytes_max{pool="G1 Old Gen",} 2.147483648E9
jvm_memory_pool_bytes_max{pool="G1 Survivor Space",} -1.0
jvm_memory_pool_bytes_max{pool="CodeHeap 'non-profiled nmethods'",} 1.22916864E8
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{pool="CodeHeap 'non-nmethods'",} 2555904.0
jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
jvm_memory_pool_bytes_init{pool="CodeHeap 'profiled nmethods'",} 2555904.0
jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
jvm_memory_pool_bytes_init{pool="G1 Eden Space",} 1.13246208E8
jvm_memory_pool_bytes_init{pool="G1 Old Gen",} 2.03423744E9
jvm_memory_pool_bytes_init{pool="G1 Survivor Space",} 0.0
jvm_memory_pool_bytes_init{pool="CodeHeap 'non-profiled nmethods'",} 2555904.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2267.68
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.666939948148E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 66.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 4096.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 5.05772032E9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.834196992E9
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="G1 Young Generation",} 548.0
jvm_gc_collection_seconds_sum{gc="G1 Young Generation",} 4.48
jvm_gc_collection_seconds_count{gc="G1 Old Generation",} 0.0
jvm_gc_collection_seconds_sum{gc="G1 Old Generation",} 0.0
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP pinot_controller_validateion_TotalDocumentCount_Value Attribute exposed for management ("org.apache.pinot.common.metrics"<type="ValidationMetrics", name="pinot.controller.paxFoodSurgeMirrorMetric_REALTIME.TotalDocumentCount"><>Value)
# TYPE pinot_controller_validateion_TotalDocumentCount_Value untyped
pinot_controller_validateion_TotalDocumentCount_Value{table="paxFoodSurgeMirrorMetric_REALTIME",} 8.37076271E8
pinot_controller_validateion_TotalDocumentCount_Value{table="midasClickstreamMetric_REALTIME",} -4.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 0.3607429
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP jmx_exporter_build_info A metric with a constant '1' value labeled with the version of the JMX exporter.
# TYPE jmx_exporter_build_info gauge
jmx_exporter_build_info{version="0.12.0",name="jmx_prometheus_javaagent",} 1.0
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 114.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 91.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 116.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 197.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 15586.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 15586.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 0.0
# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
jvm_buffer_pool_used_bytes{pool="direct",} 359834.0
# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
jvm_buffer_pool_capacity_bytes{pool="direct",} 359834.0
# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
jvm_buffer_pool_used_buffers{pool="direct",} 28.0
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="11.0.16+8-post-Ubuntu-0ubuntu118.04",vendor="Ubuntu",runtime="OpenJDK Runtime Environment",} 1.0

Tim Santos

10/28/2022, 5:47 PM

It does sometimes take a bit of time for the metrics to register since sometimes they are emitted from periodic tasks. Can you make some queries and check port 8008 for the pinot broker?

Lee Wei Hern Jason

10/28/2022, 5:49 PM

Hmm I currently only enabled Prometheus on my controller. Does it matter if it is scrapping from controller or broker? I’m guessing querying to trigger a send of the metric? I queried the port 8008 by running localhost:8008/metrics and the output is shown above

Lee Wei Hern Jason

10/28/2022, 6:15 PM

maybe your theory is right, the server node shows some of the metrics. stated in the doc.

Tim Santos

10/28/2022, 6:15 PM

Each Pinot component emits their own set of metrics. So you would want Prometheus to scrape from all your Pinot components to get proper metric coverage.

Lee Wei Hern Jason

10/28/2022, 6:21 PM

Yep definitely will have it on all nodes, right now just want to test the flow from pinot JMX metrics to prometheus then to datadog. By any chance is there any documentation on this flow ?

Tim Santos

10/28/2022, 6:30 PM

We have documentation on prometheus -> grafana. https://docs.pinot.apache.org/operators/tutorials/monitor-pinot-using-prometheus-and-grafana

Tim Santos

10/28/2022, 6:30 PM

But i dont think we have anything for datadog yet

Tim Santos

10/28/2022, 6:30 PM

It would be great if you could contribute docs based on your finding

Lee Wei Hern Jason

10/28/2022, 6:35 PM

yep i saw this, will document down if i manage to configure it. Just face issue with bringing the data from prometheus to dd. Thanks Tim for your prompt replies 🙏

👍 1

Gurpreet Singh

09/19/2023, 12:24 PM

Hi Lee! Were you able to get the metrics mentioned on the monitoring page? I see pinot related metrics but I am unable to locate the metrics like SEGMENT_DOWNLOAD_FAILURES when I do curl on /metrics

Lee Wei Hern Jason

09/19/2023, 12:25 PM

Maybe is cause there is no segment download failures. Only when the metric is triggered then the metric will be sent.

Gurpreet Singh

09/19/2023, 12:25 PM

there are lot of such metrics

Lee Wei Hern Jason

09/21/2023, 1:56 AM

there are lot of such metrics

wdym by this ? you are specifically looking for

SEGMENT_DOWNLOAD_FAILURES

, its only emitted when there is a segment download failure.

Open in Slack

Previous Next