Prometheus: cannot export metrics from connected Kubernetes cluster - kubernetes

The issue: I have a Prometheus outside of Kubernetes cluster. So, I want to export metrics from remote cluster.
I took the config sample from Prometheus Github repo and modified this a little bit. So, here is my jobs config.
- job_name: 'kubernetes-apiservers'
scheme: http
kubernetes_sd_configs:
- role: endpoints
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;http
- job_name: 'kubernetes-nodes'
scheme: http
kubernetes_sd_configs:
- role: node
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-service-endpoints'
scheme: http
kubernetes_sd_configs:
- role: endpoints
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (http?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-services'
scheme: http
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_service_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
scheme: http
kubernetes_sd_configs:
- role: pod
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
I don't use a TLS connection to API, so I want to disable it.
When I curl /metrics URL from Prometheus host - it prints them.
Finally I connected to the cluster, but...the jobs are not up and therefore Prometheus doesn't expose relabeled metrics.
What I see in Console.
Targets state:
Also I checked the Prometheus debug. It's thought the system gets any necessary information and requests are done successfully.
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"UDP\", \"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-local\"}, model.LabelSet{\"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp-local\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10055\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10055\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns\", \"__meta_kubernetes_pod_container_port_protocol\":\"UDP\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_number\":\"10054\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10054\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq-metrics\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:8080\", \"__meta_kubernetes_pod_container_name\":\"healthz\", \"__meta_kubernetes_pod_container_port_number\":\"8080\", \"__meta_kubernetes_pod_container_port_name\":\"\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"kube-system\\\",\\\"name\\\":\\\"kube-dns-2924299975\\\",\\\"uid\\\":\\\"fa808d95-d7d9-11e6-9ac9-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"89\\\"}}\\n\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_affinity\":\"{\\\"nodeAffinity\\\":{\\\"requiredDuringSchedulingIgnoredDuringExecution\\\":{\\\"nodeSelectorTerms\\\":[{\\\"matchExpressions\\\":[{\\\"key\\\":\\\"beta.kubernetes.io/arch\\\",\\\"operator\\\":\\\"In\\\",\\\"values\\\":[\\\"amd64\\\"]}]}]}}}\", \"__meta_kubernetes_pod_name\":\"kube-dns-2924299975-dksg5\", \"__meta_kubernetes_pod_ip\":\"10.32.0.2\", \"__meta_kubernetes_pod_label_k8s_app\":\"kube-dns\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"2924299975\", \"__meta_kubernetes_pod_label_tier\":\"node\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_tolerations\":\"[{\\\"key\\\":\\\"dedicated\\\",\\\"value\\\":\\\"master\\\",\\\"effect\\\":\\\"NoSchedule\\\"}]\", \"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_pod_node_name\":\"cluster-manager.dev.example.net\", \"__meta_kubernetes_pod_label_component\":\"kube-dns\", \"__meta_kubernetes_pod_label_kubernetes_io_cluster_service\":\"true\", \"__meta_kubernetes_pod_host_ip\":\"54.194.166.39\", \"__meta_kubernetes_pod_label_name\":\"kube-dns\"}, Source:\"pod/kube-system/kube-dns-2924299975-dksg5\"}"
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"10.43.0.0\", \"__meta_kubernetes_pod_container_name\":\"bot\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_host_ip\":\"172.17.101.25\", \"__meta_kubernetes_pod_label_app\":\"bot\", \"__meta_kubernetes_namespace\":\"default\", \"__meta_kubernetes_pod_name\":\"bot-272181271-pnzsz\", \"__meta_kubernetes_pod_ip\":\"10.43.0.0\", \"__meta_kubernetes_pod_node_name\":\"ip-172-17-101-25\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"default\\\",\\\"name\\\":\\\"bot-272181271\\\",\\\"uid\\\":\\\"c297b3c2-e15d-11e6-a28a-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1465127\\\"}}\\n\", \"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"272181271\", \"__meta_kubernetes_pod_label_version\":\"v0.1\"}, Source:\"pod/default/bot-272181271-pnzsz\"}"
Prometheus fetches updates, but...doesn't convert them to metrics.
So, I've broken my brain to figure out why is it going this way. So, please, help if you can figure out where might be mistake.

If you want to monitor a Kubernetes cluster from an external Prometheus server, I would suggest to set up a Prometheus federation topology:
Inside the K8s, install node-exporter pods and a Prometheus instance with short-term storage.
Expose the Prometheus service out of the K8s cluster, either via an ingress-controller (LB), or a node port. You can protect this endpoint with HTTPS + basic authentication.
Configure the center Prometheus to scrape metrics from above endpoint with proper authentication and tags.
This is the scalable solution. You can add monitor as many K8s clusters you want, until it reaches the capacities of the center Prometheus. Then you can add another center Prometheus instance to monitor others.

Finally I came to the though it's not trivial to setup Kubernetes cluster monitoring outside of cluster. Cause Kubernetes architecture suggested to keep all infrastructure within one local network. So, every workaround is going to be messy.
Also I came to the problem trying to debug why all configured jobs about Kubernetes roles such as nods, pods, services and endpoints doensn't even show up in targets status page. I may think wrong, but I didn't find out how to debug this issue in Prometheus.
My solution to monitor Kubernetes cluster outside was a kube-api-exporter. Pretty simple Python script which gets all metrics about ds, deployments and pods and finally provides the URL to fetch them. So, I'd recommend to come to this solution everyone who's got stuck with this sort of integration.
Also I started to fetch metrics from etcd. That's cool that etcd provides Prometheus-style metrics out of the box.
P.S.: thanks to FuzzyAmi for help.

Related

Scape metrics from multiple containers in prometheus with Istio

Our application is deployed in the istio service mesh and we are trying to scrape metrics at container level using the prometheus.io annotations.
So we have enabled spring boot metrics in our application and we are able to fetch the metrics on the given path '/manage/prometheus'.
We have enabled Prometheus annotations in the deployment file of our application as follows:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
prometheus.io/path: '/manage/prometheus'
This works fine when there is a single container in the pod.
But for pods that have multiple containers, we are unable to scrape the metrics with the container port.
Following are the workarounds we tried:
Following the reference https://gist.github.com/bakins/5bf7d4e719f36c1c555d81134d8887eb we tried to add the relabel configs for scraping data at container level:
prometheus-config.yaml
scrape-configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: (.*)
- source_labels: [ __address__, __meta_kubernetes_pod_container_port_number]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
- action: drop
regex: Pending|Succeeded|Failed
source_labels:
- __meta_kubernetes_pod_phase
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
source_labels:
- __meta_kubernetes_pod_container_port_number
target_label: container_port
But after applying the above configuration we are getting the error as:
Get "http://10.x.x.x:8080/stats/prometheus": read tcp 10.y.y.y:45542->10.x.x.x:8080: read: connection reset by peer
So 10.x.x.x is the pod IP and 8080 is the container port, it is not able to scrape using the container port.
We tried the above configuration after removing the istio mesh i.e. by removing the istio sidecar from all the microservices pods and we could see container level metrics being scraped.
Istio’s proxy is somewhere blocking the metrics to be scraped at the container level.
Have anyone faced this similar issue?

Changing Prometheus job label in scraper for cAdvisor breaks Grafana dashboards

I installed Prometheus on my Kubernetes cluster with Helm, using the community chart kube-prometheus-stack - and I get some beautiful dashboards in the bundled Grafana instance. I now wanted the recommender from the Vertical Pod Autoscaler to use Prometheus as a data source for historic metrics, as described here. Meaning, I had to make a change to the Prometheus scraper settings for cAdvisor, and this answer pointed me in the right direction, as after making that change I can now see the correct job tag on metrics from cAdvisor.
Unfortunately, now some of the charts in the Grafana dashboards are broken. It looks like it no longer picks up the CPU metrics - and instead just displays "No data" for the CPU-related charts.
So, I assume I have to tweak the charts to be able to pick up the metrics correctly again, but I don't see any obvious places to do this in Grafana?
Not sure if it is relevant for the question, but I am running my Kubernetes cluster on Azure Kubernetes Service (AKS).
This is the full values.yaml I supply to the Helm chart when installing Prometheus:
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
kubeEtcd:
enabled: false
kubeProxy:
enabled: false
kubelet:
serviceMonitor:
# Diables the normal cAdvisor scraping, as we add it with the job name "kubernetes-cadvisor" under additionalScrapeConfigs
# The reason for doing this is to enable the VPA to use the metrics for the recommender
# https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md#how-can-i-use-prometheus-as-a-history-provider-for-the-vpa-recommender
cAdvisor: false
prometheus:
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
# the azurefile storage class is created automatically on AKS
storageClassName: azurefile
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 50Gi
additionalScrapeConfigs:
- job_name: 'kubernetes-cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
Kubernetes version: 1.21.2
kube-prometheus-stack version: 18.1.1
helm version: version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"dirty", GoVersion:"go1.16.5"}
Unfortunately, I don't have access to Azure AKS, so I've reproduced this issue on my GKE cluster. Below I'll provide some explanations that may help to resolve your problem.
First you can try to execute this node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate rule to see if it returns any result:
If it doesn't return any records, please read the following paragraphs.
Creating a scrape configuration for cAdvisor
Rather than creating a completely new scrape configuration for cadvisor, I would suggest using one that is generated by default when kubelet.serviceMonitor.cAdvisor: true, but with a few modifications such as changing the label to job=kubernetes-cadvisor.
In my example, the 'kubernetes-cadvisor' scrape configuration looks like this:
NOTE: I added this config under the additionalScrapeConfigs in the values.yaml file (the rest of the values.yaml file may be like yours).
- job_name: 'kubernetes-cadvisor'
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
namespaces:
names:
- kube-system
Modifying Prometheus Rules
By default, Prometheus rules fetching data from cAdvisor use job="kubelet" in their PromQL expressions:
After changing job=kubelet to job=kubernetes-cadvisor, we also need to modify this label in the Prometheus rules:
NOTE: We just need to modify the rules that have metrics_path="/metrics/cadvisor (these are rules that retrieve data from cAdvisor).
$ kubectl get prometheusrules prom-1-kube-prometheus-sta-k8s.rules -o yaml
...
- name: k8s.rules
rules:
- expr: |-
sum by (cluster, namespace, pod, container) (
irate(container_cpu_usage_seconds_total{job="kubernetes-cadvisor", metrics_path="/metrics/cadvisor", image!=""}[5m])
) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
...
here we have a few more rules to modify...
After modifying Prometheus rules and waiting some time, we can see if it works as expected. We can try to execute node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate as in the beginning.
Additionally, let's check out our Grafana to make sure it has started displaying our dashboards correctly:

How to relabel Kubernetes metrics with dynamic pod URLs in Prometheus?

I am trying to discover both pods and nodes on the same job using kubernetes_sd_configs and use their labels together.
I have blackbox-exporter on multiple pods in my cluster on different nodes, my goal is to monitor each of them, but I am having trouble with the metric relabelling.
I am trying to achieve something like this:
http://<pod-ip1>:8082/metrics?instance=<node-ip1>
http://<pod-ip2>:8082/metrics?instance=<node-ip1>
http://<pod-ipN>:8082/metrics?instance=<node-ip1>
.
.
http://<pod-ip1>:8082/metrics?instance=<node-ip2>
http://<pod-ip2>:8082/metrics?instance=<node-ip2>
http://<pod-ipN>:8082/metrics?instance=<node-ip2>
.
.
My current configuration looks like the following, but the pod URL is missing:
- job_name: 'kubernetes-pods'
metrics_path: /probe
params:
module: [ping]
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: <pod_should_be_here>:9115
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __param_target
- source_labels: [__param_target]
target_label: instance

Prometheus job "kubernetes-nodes" endpoints in state "UNKNOWN"

We are facing one issue, that some endpoints are in state "UNKNOWN". Prometheus job "kubernetes-nodes".
Nodes and Prometheus are all up for several days. We tried to curl those "kubernetes-nodes" endpoints, which are in "UNKNOWN" state. Metrics can be correctly curled, but endpoint state is still "UNKNOWN". We don't know the reason (criteria, on which case it will be marked as "UNKNOWN").
I know before Prometheus does its first scrape, endpoints are in "UNKNOWN" state. Then, if scrape successes, endpoint will be "UP", if fails, "DOWN". However, in below screenshot it seems some endpoints are never been scraped...We just don't know why.
Could you please give advice, about the possible reason of such case?
Does this mean this node (name hide in red block...) has something wrong? If so, is it possible to fix, that will let Prometheus treat it as "UP"?
Thanks in advance.
- job_name: kubernetes-nodes
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- api_server: null
role: node
namespaces:
names: []
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
I think you're missing nodes/proxy resource in your prometheus cluster role. Here's official example github.com/prometheus/documentation/examples/rbac-setup.yml.

Disable scraping of specific endpoints

Using Prometheus we are scraping all our Kubernetes endpoints. Here is our relevant configuration in prometheus.yaml:
- job_name: 'kubernetes-nodes'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kube-worker
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
Somehow all our services are scraped, even if we do not set the prometheus.io/scrape to true in the application's service.yaml.
Now we do not want to scrape two endpoints. Is there a way to configure this?