celery-prometheus-exporter gives celery_workers count as 0 - celery

trying to monitor active workers on airflow and have used following exporter
But in the metrics exposed always getting celery_workers as 0. Here is the metrics data:
python_info{implementation="CPython",major="2",minor="7",patchlevel="15+",version="2.7.15+"} 1.0
# HELP celery_workers Number of alive workers
# TYPE celery_workers gauge
celery_workers 0.0
# HELP celery_tasks Number of tasks per state
# TYPE celery_tasks gauge
celery_tasks{state="STARTED"} 0.0
celery_tasks{state="SUCCESS"} 6.0
celery_tasks{state="RECEIVED"} 0.0
Can anyone suggest what I am missing?. Thanks in advance

Related

Quarkus service endpoint always returns system info only

I can confirm that the endpoints are working in the unittest through io.restassured.RestAssured. However, after I launched the service, every endpoint always returns a page of system info, e.g.
# HELP kafka_producer_node_request_total The total number of requests sent
# TYPE kafka_producer_node_request_total counter
kafka_producer_node_request_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",node_id="node--1",} 2.0
# HELP kafka_producer_connection_close_total The total number of connections closed
# TYPE kafka_producer_connection_close_total counter
kafka_producer_connection_close_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",} 0.0
# HELP kafka_producer_request_total The total number of requests sent
# TYPE kafka_producer_request_total counter
kafka_producer_request_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",} 2.0
# HELP kafka_producer_node_response_total The total number of responses received
# TYPE kafka_producer_node_response_total counter
kafka_producer_node_response_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",node_id="node--1",} 2.0
# HELP kafka_producer_node_response_rate The number of responses received per second
# TYPE kafka_producer_node_response_rate gauge
From the log I can see that the DBs are connected and schemas are evolved,
but where does such this info come from and why does it hijack my normal endpoints?
What a coincidence, it turned out that my application end point is also /metrics, and the quarkus.http.non-application-root-path=/, and therefore it keeps getting hijacked by quarkus metrics.
Thanks to #loicmathieu.
The solution is to configure the quarkus metrics endpoint:
quarkus.http.non-application-root-path=/
quarkus.smallrye-health.root-path=/quarkus-metrics/health
quarkus.smallrye-health.ui.root-path=/quarkus-metrics/health-ui

kafka datadog not sending metrics correctly

i am trying to send kafka consumer metrics to datadog but its not showing in monitoring when I select the node. The server is giving below check in status
Instance ID: kafka_consumer:d6........f5 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml
Total Runs: 567
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 162ms
Last Execution Date : 2021-01-14 10:49:06.000000 UTC
Last Successful Execution Date : 2021-01-14 10:49:06.000000 UTC
metadata:
version.major: 2
version.minor: 5
version.patch: 0
version.raw: 2.5.0
version.scheme: semver
JMXFetch
runtime_version : 11.0.9.1
version : 0.40.3
Initialized checks
kafka
instance_name : kafka-10.128.0.105-9999
message : <no value>
metric_count : 99
service_check_count : 0
status : OK
Failed checks
no checks
JMX is as above. Please help in finding what could be wrong.
Nothing is wrong there. DataDog conceals that the Kafka integration uses Dogstatsd under the hood. When use_dogstatsd: 'true within /etc/datadog-agent/datadog.yaml is set, metrics do appear in DataDog webUI. If that option is not set the default Broker data is available via JMXFetch using sudo -u dd-agent datadog-agent status as also via sudo -u dd-agent datadog-agent check kafka but not in the webUI.

Traefik metrics working for Prometheus but Grafana dashboards are empty

I have configured the Trafeik(v1.7.15) and Prometheus operator with stable HELM chart(chart version 8.2.4).
But however I can't see any metrics data from Grafana dashboards and they were empty.
Also I can see the metrics coming with POD IP:8080 port with a curl command. Refer the following metrics extract and few important configuration manifests.
Also I can see that trafeik service monitor is in UP state from Prometheus and same strategy I have done for Mongo/Postgres/Rabbit MQ metrics and those grafana dashboards are with rich set of data representation and working fine.
So highly appreciate if some one can guide me on right track of fixing and displaying Trafeik ingress controller metrics from grafana? Also let me know the cause of this?
I am using following Grafana dashboards and none of shows data.
Few dashboard ID's - 4475 , 8214, 11741, 6293.
THANK YOU
Trafeik Configurations:
Deployment YAML arguments
ports:
- name: http
containerPort: 80
- name: admin
containerPort: 8080
- name: https
containerPort: 443
args:
#- --api
- --web
- --web.metrics.prometheus
- --kubernetes
- --logLevel=INFO
- --configfile=/config/traefik.toml
volumeMounts:
- mountPath: /config
name: config
- mountPath: /ssl
name: ssl
Configmap TOML File
traefik.toml: |
# traefik.toml
logLevel = "INFO"
defaultEntryPoints = ["http","https"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
CertFile = "/ssl/tls.crt"
KeyFile = "/ssl/tls.key"
[metrics]
[metrics.prometheus]
buckets = [0.1,0.3,1.2,5.0]
Prometheus service monitor YAML
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: traefik-sm
labels:
release: my-prometheus
spec:
selector:
matchLabels:
k8s-app: traefik-ingress-lb
namespaceSelector:
any: true
endpoints:
- port: admin-ui
name: traefik-ingress-service
targetPort: 8080
path: /metrics
interval: 10s
honorLabels: true
Trafeik metrics with CURL
ubuntu#k8s-node1:~$ curl http://10.96.1.141:8080/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.3978e-05
go_gc_duration_seconds{quantile="0.25"} 1.86e-05
go_gc_duration_seconds{quantile="0.5"} 2.3194e-05
go_gc_duration_seconds{quantile="0.75"} 5.2525e-05
go_gc_duration_seconds{quantile="1"} 0.090356709
go_gc_duration_seconds_sum 12.978064956
go_gc_duration_seconds_count 3774
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 64
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 8.322768e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 2.7448991752e+10
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.579943e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.5932029e+08
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0.00037814152889298634
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.4064e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 8.322768e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 5.3641216e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.261568e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 54120
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 4.636672e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6256896e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5858102844353108e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.5937441e+08
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 3472
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 180000
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 245760
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.6043632e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 666961
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 851968
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 851968
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.2024312e+07
# HELP go_threads Number of OS threads created
# TYPE go_threads gauge
go_threads 11
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 553.04
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 11
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 6.9451776e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.58573313806e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.90099456e+08
# HELP traefik_backend_server_up Backend server is up, described by gauge value of 0 or 1.
# TYPE traefik_backend_server_up gauge
traefik_backend_server_up{backend="auth-jooqa.abc.com/",url="http://192.168.22.77:8180"}
# HELP traefik_config_last_reload_failure Last config reload failure
# TYPE traefik_config_last_reload_failure gauge
traefik_config_last_reload_failure 0
# HELP traefik_config_last_reload_success Last config reload success
# TYPE traefik_config_last_reload_success gauge
traefik_config_last_reload_success 1.585741581e+09
# HELP traefik_config_reloads_failure_total Config failure reloads
# TYPE traefik_config_reloads_failure_total counter
traefik_config_reloads_failure_total 0
# HELP traefik_config_reloads_total Config reloads
# TYPE traefik_config_reloads_total counter
traefik_config_reloads_total 4
There are too few metrics exported by traefik
If you check your exported metrics, there are too few:
$ curl -s http://10.96.1.141:8080/metrics | grep -P '^traefik_'
traefik_backend_server_up{backend="auth-jooqa.abc.com/",url="http://192.168.22.77:8180"}
traefik_config_last_reload_failure 0
traefik_config_last_reload_success 1.585741581e+09
traefik_config_reloads_failure_total 0
traefik_config_reloads_total 4
Hard to find ready-made grafana dashboard with your set of metrics
Let's grep expr tag in mentioned dashboards (4475 , 8214, 11741, [6293](https://grafana.com/grafana/dashboards/6293
for dashboard_url in 'https://grafana.com/api/dashboards/4475/revisions/4/download' 'https://grafana.com/api/dashboards/6293/revisions/2/download' 'https://grafana.com/api/dashboards/8214/revisions/1/download' 'https://grafana.com/api/dashboards/11741/revisions/1/download' ; do
echo "\t = Dashboard: $dashboard_url = "
curl -s $dashboard_url | jq '.panels[].targets[0].expr' | grep -Po 'traefik_[a-z_]+' | sort |uniq
done
))
The command above return list of traefik_* metrics used in expr of appropriate dashboard:
= Dashboard: https://grafana.com/api/dashboards/4475/revisions/4/download =
traefik_backend_request_duration_seconds_sum
traefik_backend_requests_total
traefik_backend_server_up
traefik_config_reloads_total
traefik_entrypoint_requests_total
= Dashboard: https://grafana.com/api/dashboards/6293/revisions/2/download =
traefik_backend_open_connections
traefik_backend_request_duration_seconds_sum
traefik_backend_requests_total
traefik_entrypoint_open_connections
traefik_entrypoint_request_duration_seconds_sum
traefik_entrypoint_requests_total
= Dashboard: https://grafana.com/api/dashboards/8214/revisions/1/download =
traefik_backend_request_duration_seconds_sum
traefik_backend_requests_total
traefik_entrypoint_request_duration_seconds_sum
traefik_entrypoint_requests_total
= Dashboard: https://grafana.com/api/dashboards/11741/revisions/1/download =
traefik_entrypoint_open_connections
traefik_entrypoint_request_duration_seconds_sum
traefik_entrypoint_requests_total
traefik_service_open_connections
traefik_service_request_duration_seconds_count
traefik_service_request_duration_seconds_sum
traefik_service_requests_total
As you can see, only two of 5 metrics are used.
Let's try to find appropriate dashboard
Since these 4 dashboards aren't appropriate for your metric set, lets try to find appropriate dashboard in GitHub:
traefik_backend_server_up: 8 code results
traefik_backend_server_up or traefik_config_reloads_total: 11 code results
traefik_config_last_reload_failure OR traefik_config_last_reload_success OR traefik_config_reloads_failure_total: 1 code results
Suggestions
So, id suggest:
either try to update traefik to expose more actual metric set
or create your own dashboard, it's easy
P.S. grafana-dashboard-builder for easier creation of Grafana dashboards
There is an open-source tool for easier creation of dashboards:
jakubplichta/grafana-dashboard-builder: Generate Grafana dashboards with YAML
Currently it supports three data-stores:
Graphite
Prometheus
InfluxDB

configuring kafka with JMX-exporter- centos 7

I want to enable kafka monitoring and I am starting with single node deployment as test. I am following steps from https://alex.dzyoba.com/blog/jmx-exporter/
i tried following steps; the last command which checks for jmx-exporter HTTP server reports blank. i believe this is the reason, why I am not seeing metrics from kafka.(more on this below)
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.6/jmx_prometheus_javaagent-0.6.jar
wget https://raw.githubusercontent.com/prometheus/jmx_exporter/master/example_configs/kafka-0-8-2.yml
export KAFKA_OPTS='-javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent-0.6.jar=7071:/etc/jmx-exporter/kafka-0-8-2.yml'
/opt/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka_2.11-0.10.1.0/conf/server.properties
netstat -plntu | grep 7071
kafka broker log on console does not have any ERROR message.
i have Prometheus running in a container and http://IP:9090/metrics shows bunch of metrics.
when i searched for "kafka" it returned following
# TYPE net_conntrack_dialer_conn_attempted_total counter
net_conntrack_dialer_conn_attempted_total{dialer_name="kafka"} 79
# TYPE net_conntrack_dialer_conn_closed_total counter
net_conntrack_dialer_conn_closed_total{dialer_name="kafka"} 0
net_conntrack_dialer_conn_established_total{dialer_name="kafka"} 0
# TYPE net_conntrack_dialer_conn_failed_total counter
net_conntrack_dialer_conn_failed_total{dialer_name="kafka",reason="refused"} 79
net_conntrack_dialer_conn_failed_total{dialer_name="kafka",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="kafka",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="kafka",reason="unknown"} 79
# TYPE prometheus_sd_discovered_targets gauge
prometheus_sd_discovered_targets{config="kafka",name="scrape"} 1
# HELP prometheus_target_sync_length_seconds Actual interval to sync the scrape pool.
# TYPE prometheus_target_sync_length_seconds summary
prometheus_target_sync_length_seconds{scrape_job="kafka",quantile="0.01"} NaN
prometheus_target_sync_length_seconds{scrape_job="kafka",quantile="0.05"} NaN
prometheus_target_sync_length_seconds{scrape_job="kafka",quantile="0.5"} NaN
prometheus_target_sync_length_seconds{scrape_job="kafka",quantile="0.9"} NaN
prometheus_target_sync_length_seconds{scrape_job="kafka",quantile="0.99"} NaN
prometheus_target_sync_length_seconds_sum{scrape_job="kafka"} 0.000198245
prometheus_target_sync_length_seconds_count{scrape_job="kafka"} 1
My guess is prometheous is not getting any metrics on port 7071; which aligns with earlier finding that JMX server is not respond on port 7071.
can you help me enabling kafka monitoring using JMX-exporter and Prometheus?
I have Prometheus running in a container
You need to configure Prometheus to scrape your external LAN IP, then because you're running Kafka outside of a container.
You can see on this line that the connection is being refused with your current setup
net_conntrack_dialer_conn_failed_total{dialer_name="kafka",reason="refused"} 79
You should either run Prometheus on your host and scrape localhost:7071
Or run Kafka in a container if you want kafka:7071 to be discoverable by Prometheus

What is the use of health-check-type attribute

I have deployed one app both in bluemix and pivotal. Below is the manifest file
---
applications:
- name: test
memory: 128M
instances: 1
no-route: true
health-check-type: none //Why we have to use this?
In bluemix, without the health-check-type attribute my app is getting started. But in pivotal, I get the below message continuously due to which the app is getting crashed.
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
0 of 1 instances starting
FAILED
After passing health-check-type: none in manifest.yml (in pivotal), app is getting started without any issues.
So can someone tell me is it mandatory to use health-check-type attribute?
IBM Bluemix is on the older "DEA" architecture, while Pivotal is on the current "Diego" architecture. You can see how the two differ when it comes to the no-route option here.