Prometheus: cannot export metrics from connected Kubernetes cluster - kubernetes
The issue: I have a Prometheus outside of Kubernetes cluster. So, I want to export metrics from remote cluster.
I took the config sample from Prometheus Github repo and modified this a little bit. So, here is my jobs config.
- job_name: 'kubernetes-apiservers'
scheme: http
kubernetes_sd_configs:
- role: endpoints
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;http
- job_name: 'kubernetes-nodes'
scheme: http
kubernetes_sd_configs:
- role: node
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-service-endpoints'
scheme: http
kubernetes_sd_configs:
- role: endpoints
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (http?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-services'
scheme: http
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_service_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
scheme: http
kubernetes_sd_configs:
- role: pod
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
I don't use a TLS connection to API, so I want to disable it.
When I curl /metrics URL from Prometheus host - it prints them.
Finally I connected to the cluster, but...the jobs are not up and therefore Prometheus doesn't expose relabeled metrics.
What I see in Console.
Targets state:
Also I checked the Prometheus debug. It's thought the system gets any necessary information and requests are done successfully.
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"UDP\", \"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-local\"}, model.LabelSet{\"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp-local\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10055\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10055\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns\", \"__meta_kubernetes_pod_container_port_protocol\":\"UDP\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_number\":\"10054\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10054\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq-metrics\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:8080\", \"__meta_kubernetes_pod_container_name\":\"healthz\", \"__meta_kubernetes_pod_container_port_number\":\"8080\", \"__meta_kubernetes_pod_container_port_name\":\"\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"kube-system\\\",\\\"name\\\":\\\"kube-dns-2924299975\\\",\\\"uid\\\":\\\"fa808d95-d7d9-11e6-9ac9-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"89\\\"}}\\n\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_affinity\":\"{\\\"nodeAffinity\\\":{\\\"requiredDuringSchedulingIgnoredDuringExecution\\\":{\\\"nodeSelectorTerms\\\":[{\\\"matchExpressions\\\":[{\\\"key\\\":\\\"beta.kubernetes.io/arch\\\",\\\"operator\\\":\\\"In\\\",\\\"values\\\":[\\\"amd64\\\"]}]}]}}}\", \"__meta_kubernetes_pod_name\":\"kube-dns-2924299975-dksg5\", \"__meta_kubernetes_pod_ip\":\"10.32.0.2\", \"__meta_kubernetes_pod_label_k8s_app\":\"kube-dns\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"2924299975\", \"__meta_kubernetes_pod_label_tier\":\"node\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_tolerations\":\"[{\\\"key\\\":\\\"dedicated\\\",\\\"value\\\":\\\"master\\\",\\\"effect\\\":\\\"NoSchedule\\\"}]\", \"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_pod_node_name\":\"cluster-manager.dev.example.net\", \"__meta_kubernetes_pod_label_component\":\"kube-dns\", \"__meta_kubernetes_pod_label_kubernetes_io_cluster_service\":\"true\", \"__meta_kubernetes_pod_host_ip\":\"54.194.166.39\", \"__meta_kubernetes_pod_label_name\":\"kube-dns\"}, Source:\"pod/kube-system/kube-dns-2924299975-dksg5\"}"
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"10.43.0.0\", \"__meta_kubernetes_pod_container_name\":\"bot\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_host_ip\":\"172.17.101.25\", \"__meta_kubernetes_pod_label_app\":\"bot\", \"__meta_kubernetes_namespace\":\"default\", \"__meta_kubernetes_pod_name\":\"bot-272181271-pnzsz\", \"__meta_kubernetes_pod_ip\":\"10.43.0.0\", \"__meta_kubernetes_pod_node_name\":\"ip-172-17-101-25\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"default\\\",\\\"name\\\":\\\"bot-272181271\\\",\\\"uid\\\":\\\"c297b3c2-e15d-11e6-a28a-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1465127\\\"}}\\n\", \"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"272181271\", \"__meta_kubernetes_pod_label_version\":\"v0.1\"}, Source:\"pod/default/bot-272181271-pnzsz\"}"
Prometheus fetches updates, but...doesn't convert them to metrics.
So, I've broken my brain to figure out why is it going this way. So, please, help if you can figure out where might be mistake.
If you want to monitor a Kubernetes cluster from an external Prometheus server, I would suggest to set up a Prometheus federation topology:
Inside the K8s, install node-exporter pods and a Prometheus instance with short-term storage.
Expose the Prometheus service out of the K8s cluster, either via an ingress-controller (LB), or a node port. You can protect this endpoint with HTTPS + basic authentication.
Configure the center Prometheus to scrape metrics from above endpoint with proper authentication and tags.
This is the scalable solution. You can add monitor as many K8s clusters you want, until it reaches the capacities of the center Prometheus. Then you can add another center Prometheus instance to monitor others.
Finally I came to the though it's not trivial to setup Kubernetes cluster monitoring outside of cluster. Cause Kubernetes architecture suggested to keep all infrastructure within one local network. So, every workaround is going to be messy.
Also I came to the problem trying to debug why all configured jobs about Kubernetes roles such as nods, pods, services and endpoints doensn't even show up in targets status page. I may think wrong, but I didn't find out how to debug this issue in Prometheus.
My solution to monitor Kubernetes cluster outside was a kube-api-exporter. Pretty simple Python script which gets all metrics about ds, deployments and pods and finally provides the URL to fetch them. So, I'd recommend to come to this solution everyone who's got stuck with this sort of integration.
Also I started to fetch metrics from etcd. That's cool that etcd provides Prometheus-style metrics out of the box.
P.S.: thanks to FuzzyAmi for help.
Related
Scape metrics from multiple containers in prometheus with Istio
Our application is deployed in the istio service mesh and we are trying to scrape metrics at container level using the prometheus.io annotations. So we have enabled spring boot metrics in our application and we are able to fetch the metrics on the given path '/manage/prometheus'. We have enabled Prometheus annotations in the deployment file of our application as follows: metadata: annotations: prometheus.io/scrape: 'true' prometheus.io/port: '8080' prometheus.io/path: '/manage/prometheus' This works fine when there is a single container in the pod. But for pods that have multiple containers, we are unable to scrape the metrics with the container port. Following are the workarounds we tried: Following the reference https://gist.github.com/bakins/5bf7d4e719f36c1c555d81134d8887eb we tried to add the relabel configs for scraping data at container level: prometheus-config.yaml scrape-configs: - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - source_labels: [__meta_kubernetes_pod_container_port_name] action: keep regex: (.*) - source_labels: [ __address__, __meta_kubernetes_pod_container_port_number] action: replace regex: (.+):(?:\d+);(\d+) replacement: ${1}:${2} target_label: __address__ - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase - action: replace source_labels: - __meta_kubernetes_pod_container_name target_label: container - action: replace source_labels: - __meta_kubernetes_pod_container_port_number target_label: container_port But after applying the above configuration we are getting the error as: Get "http://10.x.x.x:8080/stats/prometheus": read tcp 10.y.y.y:45542->10.x.x.x:8080: read: connection reset by peer So 10.x.x.x is the pod IP and 8080 is the container port, it is not able to scrape using the container port. We tried the above configuration after removing the istio mesh i.e. by removing the istio sidecar from all the microservices pods and we could see container level metrics being scraped. Istio’s proxy is somewhere blocking the metrics to be scraped at the container level. Have anyone faced this similar issue?
Changing Prometheus job label in scraper for cAdvisor breaks Grafana dashboards
I installed Prometheus on my Kubernetes cluster with Helm, using the community chart kube-prometheus-stack - and I get some beautiful dashboards in the bundled Grafana instance. I now wanted the recommender from the Vertical Pod Autoscaler to use Prometheus as a data source for historic metrics, as described here. Meaning, I had to make a change to the Prometheus scraper settings for cAdvisor, and this answer pointed me in the right direction, as after making that change I can now see the correct job tag on metrics from cAdvisor. Unfortunately, now some of the charts in the Grafana dashboards are broken. It looks like it no longer picks up the CPU metrics - and instead just displays "No data" for the CPU-related charts. So, I assume I have to tweak the charts to be able to pick up the metrics correctly again, but I don't see any obvious places to do this in Grafana? Not sure if it is relevant for the question, but I am running my Kubernetes cluster on Azure Kubernetes Service (AKS). This is the full values.yaml I supply to the Helm chart when installing Prometheus: kubeControllerManager: enabled: false kubeScheduler: enabled: false kubeEtcd: enabled: false kubeProxy: enabled: false kubelet: serviceMonitor: # Diables the normal cAdvisor scraping, as we add it with the job name "kubernetes-cadvisor" under additionalScrapeConfigs # The reason for doing this is to enable the VPA to use the metrics for the recommender # https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md#how-can-i-use-prometheus-as-a-history-provider-for-the-vpa-recommender cAdvisor: false prometheus: prometheusSpec: retention: 15d storageSpec: volumeClaimTemplate: spec: # the azurefile storage class is created automatically on AKS storageClassName: azurefile accessModes: ["ReadWriteMany"] resources: requests: storage: 50Gi additionalScrapeConfigs: - job_name: 'kubernetes-cadvisor' scheme: https metrics_path: /metrics/cadvisor tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) Kubernetes version: 1.21.2 kube-prometheus-stack version: 18.1.1 helm version: version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"dirty", GoVersion:"go1.16.5"}
Unfortunately, I don't have access to Azure AKS, so I've reproduced this issue on my GKE cluster. Below I'll provide some explanations that may help to resolve your problem. First you can try to execute this node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate rule to see if it returns any result: If it doesn't return any records, please read the following paragraphs. Creating a scrape configuration for cAdvisor Rather than creating a completely new scrape configuration for cadvisor, I would suggest using one that is generated by default when kubelet.serviceMonitor.cAdvisor: true, but with a few modifications such as changing the label to job=kubernetes-cadvisor. In my example, the 'kubernetes-cadvisor' scrape configuration looks like this: NOTE: I added this config under the additionalScrapeConfigs in the values.yaml file (the rest of the values.yaml file may be like yours). - job_name: 'kubernetes-cadvisor' honor_labels: true honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics/cadvisor scheme: https authorization: type: Bearer credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true follow_redirects: true relabel_configs: - source_labels: [job] separator: ; regex: (.*) target_label: __tmp_prometheus_job_name replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name] separator: ; regex: kubelet replacement: $1 action: keep - source_labels: [__meta_kubernetes_service_label_k8s_app] separator: ; regex: kubelet replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] separator: ; regex: https-metrics replacement: $1 action: keep - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Node;(.*) target_label: node replacement: ${1} action: replace - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] separator: ; regex: Pod;(.*) target_label: pod replacement: ${1} action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_container_name] separator: ; regex: (.*) target_label: container replacement: $1 action: replace - separator: ; regex: (.*) target_label: endpoint replacement: https-metrics action: replace - source_labels: [__metrics_path__] separator: ; regex: (.*) target_label: metrics_path replacement: $1 action: replace - source_labels: [__address__] separator: ; regex: (.*) modulus: 1 target_label: __tmp_hash replacement: $1 action: hashmod - source_labels: [__tmp_hash] separator: ; regex: "0" replacement: $1 action: keep kubernetes_sd_configs: - role: endpoints kubeconfig_file: "" follow_redirects: true namespaces: names: - kube-system Modifying Prometheus Rules By default, Prometheus rules fetching data from cAdvisor use job="kubelet" in their PromQL expressions: After changing job=kubelet to job=kubernetes-cadvisor, we also need to modify this label in the Prometheus rules: NOTE: We just need to modify the rules that have metrics_path="/metrics/cadvisor (these are rules that retrieve data from cAdvisor). $ kubectl get prometheusrules prom-1-kube-prometheus-sta-k8s.rules -o yaml ... - name: k8s.rules rules: - expr: |- sum by (cluster, namespace, pod, container) ( irate(container_cpu_usage_seconds_total{job="kubernetes-cadvisor", metrics_path="/metrics/cadvisor", image!=""}[5m]) ) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) ( 1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""}) ) record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate ... here we have a few more rules to modify... After modifying Prometheus rules and waiting some time, we can see if it works as expected. We can try to execute node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate as in the beginning. Additionally, let's check out our Grafana to make sure it has started displaying our dashboards correctly:
How to relabel Kubernetes metrics with dynamic pod URLs in Prometheus?
I am trying to discover both pods and nodes on the same job using kubernetes_sd_configs and use their labels together. I have blackbox-exporter on multiple pods in my cluster on different nodes, my goal is to monitor each of them, but I am having trouble with the metric relabelling. I am trying to achieve something like this: http://<pod-ip1>:8082/metrics?instance=<node-ip1> http://<pod-ip2>:8082/metrics?instance=<node-ip1> http://<pod-ipN>:8082/metrics?instance=<node-ip1> . . http://<pod-ip1>:8082/metrics?instance=<node-ip2> http://<pod-ip2>:8082/metrics?instance=<node-ip2> http://<pod-ipN>:8082/metrics?instance=<node-ip2> . . My current configuration looks like the following, but the pod URL is missing: - job_name: 'kubernetes-pods' metrics_path: /probe params: module: [ping] kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: <pod_should_be_here>:9115 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __param_target - source_labels: [__param_target] target_label: instance
Prometheus job "kubernetes-nodes" endpoints in state "UNKNOWN"
We are facing one issue, that some endpoints are in state "UNKNOWN". Prometheus job "kubernetes-nodes". Nodes and Prometheus are all up for several days. We tried to curl those "kubernetes-nodes" endpoints, which are in "UNKNOWN" state. Metrics can be correctly curled, but endpoint state is still "UNKNOWN". We don't know the reason (criteria, on which case it will be marked as "UNKNOWN"). I know before Prometheus does its first scrape, endpoints are in "UNKNOWN" state. Then, if scrape successes, endpoint will be "UP", if fails, "DOWN". However, in below screenshot it seems some endpoints are never been scraped...We just don't know why. Could you please give advice, about the possible reason of such case? Does this mean this node (name hide in red block...) has something wrong? If so, is it possible to fix, that will let Prometheus treat it as "UP"? Thanks in advance. - job_name: kubernetes-nodes scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - api_server: null role: node namespaces: names: [] bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true relabel_configs: - separator: ; regex: __meta_kubernetes_node_label_(.+) replacement: $1 action: labelmap - separator: ; regex: (.*) target_label: __address__ replacement: kubernetes.default.svc:443 action: replace - source_labels: [__meta_kubernetes_node_name] separator: ; regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics action: replace - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace
I think you're missing nodes/proxy resource in your prometheus cluster role. Here's official example github.com/prometheus/documentation/examples/rbac-setup.yml.
Disable scraping of specific endpoints
Using Prometheus we are scraping all our Kubernetes endpoints. Here is our relevant configuration in prometheus.yaml: - job_name: 'kubernetes-nodes' # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt server_name: kube-worker # If your node certificates are self-signed or use a different CA to the # master CA, then disable certificate verification below. Note that # certificate verification is an integral part of a secure infrastructure # so this should only be disabled in a controlled environment. You can # disable certificate verification by uncommenting the line below. # # insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) # Scrape config for service endpoints. # # The relabeling allows the actual service scrape endpoint to be configured # via the following annotations: # # * `prometheus.io/scrape`: Only scrape services that have a value of `true` # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need # to set this to `https` & most likely set the `tls_config` of the scrape config. # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. # * `prometheus.io/port`: If the metrics are exposed on a different port to the # service then set this appropriately. - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: (.+)(?::\d+);(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name Somehow all our services are scraped, even if we do not set the prometheus.io/scrape to true in the application's service.yaml. Now we do not want to scrape two endpoints. Is there a way to configure this?