We are facing one issue, that some endpoints are in state "UNKNOWN". Prometheus job "kubernetes-nodes".
Nodes and Prometheus are all up for several days. We tried to curl those "kubernetes-nodes" endpoints, which are in "UNKNOWN" state. Metrics can be correctly curled, but endpoint state is still "UNKNOWN". We don't know the reason (criteria, on which case it will be marked as "UNKNOWN").
I know before Prometheus does its first scrape, endpoints are in "UNKNOWN" state. Then, if scrape successes, endpoint will be "UP", if fails, "DOWN". However, in below screenshot it seems some endpoints are never been scraped...We just don't know why.
Could you please give advice, about the possible reason of such case?
Does this mean this node (name hide in red block...) has something wrong? If so, is it possible to fix, that will let Prometheus treat it as "UP"?
- job_name: kubernetes-nodes
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
- api_server: null
role: node
names: []
bearer_token_file: /var/run/secrets/
ca_file: /var/run/secrets/
insecure_skip_verify: true
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace

I think you're missing nodes/proxy resource in your prometheus cluster role. Here's official example


Drop or rename scrape_config label

I'd love to rename or drop a label from a /metrics endpoint within my metric. The metric itself is from the kube-state-metrics application, so nothing extraordinary. The metric looks like this:
kube_pod_container_resource_requests{container="alertmanager", instance="", funday_monday="blubb", job="some-kube-state-metrics", name="kube-state-metrics", namespace="monitoring", node="some-host-in-azure-1234", pod="alertmanager-main-1", resource="memory", uid="12345678-1234-1234-1234-123456789012", unit="byte"} 209715200
The label I'd love to replace is instance because it refers to the host which runs the kube-state-metrics application and I don't care about that. I want to have the value of node in instance and I've been trying so for hours now and can't find a way - I wonder if it's not possible at all!?
The way I'm grabbing the /metrics endpoint is through the means of a scrape-config which looks as follows:
- job_name: some-kube-state-metrics
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
- api_server: null
role: pod
scheme: http
- source_labels: [__meta_kubernetes_pod_labelpresent_kubeStateMetrics]
regex: true
action: keep
- source_labels: [__meta_kubernetes_pod_label_name]
regex: (.*)
replacement: $1
target_label: name
action: replace
- source_labels: [__meta_kubernetes_pod_container_port_name]
separator: ;
regex: http
replacement: $1
action: keep
- source_labels: [node]
regex: (.*)
replacement: blubb
target_label: funday_monday
action: replace
- action: labeldrop
regex: "unit=(.*)"
- source_labels: [ __name__ ]
regex: 'kube\_pod\_container\_resource\_requests'
action: drop
As you can tell, I've been trying to drop labels as well, namely the unit-label (just for testing purposes) and I also tried to drop the metrics all together.
The funday_monday is an example that changed because I wanted to know if static relabels are possible (it works!) - before it looked like this:
- source_labels: [node]
regex: (.*)
replacement: $1
target_label: funday_monday
action: replace
The problem is that you are doing those operations at the wrong time. relabel_configs happens before metrics are actually gathered, so, at this time, you can only manipulate the labels that you got from service discovery.
That node label comes from the exporter. Therefore, you need to do this relabeling action under metric_relabel_configs:
- source_labels: [node]
target_label: instance
Same goes for dropping metrics. If you wish a bit more info, I answered a similar question here: prometheus relabel_config drop action not working

Changing Prometheus job label in scraper for cAdvisor breaks Grafana dashboards

I installed Prometheus on my Kubernetes cluster with Helm, using the community chart kube-prometheus-stack - and I get some beautiful dashboards in the bundled Grafana instance. I now wanted the recommender from the Vertical Pod Autoscaler to use Prometheus as a data source for historic metrics, as described here. Meaning, I had to make a change to the Prometheus scraper settings for cAdvisor, and this answer pointed me in the right direction, as after making that change I can now see the correct job tag on metrics from cAdvisor.
Unfortunately, now some of the charts in the Grafana dashboards are broken. It looks like it no longer picks up the CPU metrics - and instead just displays "No data" for the CPU-related charts.
So, I assume I have to tweak the charts to be able to pick up the metrics correctly again, but I don't see any obvious places to do this in Grafana?
Not sure if it is relevant for the question, but I am running my Kubernetes cluster on Azure Kubernetes Service (AKS).
This is the full values.yaml I supply to the Helm chart when installing Prometheus:
enabled: false
enabled: false
enabled: false
enabled: false
# Diables the normal cAdvisor scraping, as we add it with the job name "kubernetes-cadvisor" under additionalScrapeConfigs
# The reason for doing this is to enable the VPA to use the metrics for the recommender
cAdvisor: false
retention: 15d
# the azurefile storage class is created automatically on AKS
storageClassName: azurefile
accessModes: ["ReadWriteMany"]
storage: 50Gi
- job_name: 'kubernetes-cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
ca_file: /var/run/secrets/
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/
- role: node
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
Kubernetes version: 1.21.2
kube-prometheus-stack version: 18.1.1
helm version: version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"dirty", GoVersion:"go1.16.5"}
Unfortunately, I don't have access to Azure AKS, so I've reproduced this issue on my GKE cluster. Below I'll provide some explanations that may help to resolve your problem.
First you can try to execute this node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate rule to see if it returns any result:
If it doesn't return any records, please read the following paragraphs.
Creating a scrape configuration for cAdvisor
Rather than creating a completely new scrape configuration for cadvisor, I would suggest using one that is generated by default when kubelet.serviceMonitor.cAdvisor: true, but with a few modifications such as changing the label to job=kubernetes-cadvisor.
In my example, the 'kubernetes-cadvisor' scrape configuration looks like this:
NOTE: I added this config under the additionalScrapeConfigs in the values.yaml file (the rest of the values.yaml file may be like yours).
- job_name: 'kubernetes-cadvisor'
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
type: Bearer
credentials_file: /var/run/secrets/
ca_file: /var/run/secrets/
insecure_skip_verify: true
follow_redirects: true
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
- kube-system
Modifying Prometheus Rules
By default, Prometheus rules fetching data from cAdvisor use job="kubelet" in their PromQL expressions:
After changing job=kubelet to job=kubernetes-cadvisor, we also need to modify this label in the Prometheus rules:
NOTE: We just need to modify the rules that have metrics_path="/metrics/cadvisor (these are rules that retrieve data from cAdvisor).
$ kubectl get prometheusrules prom-1-kube-prometheus-sta-k8s.rules -o yaml
- name: k8s.rules
- expr: |-
sum by (cluster, namespace, pod, container) (
irate(container_cpu_usage_seconds_total{job="kubernetes-cadvisor", metrics_path="/metrics/cadvisor", image!=""}[5m])
) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
here we have a few more rules to modify...
After modifying Prometheus rules and waiting some time, we can see if it works as expected. We can try to execute node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate as in the beginning.
Additionally, let's check out our Grafana to make sure it has started displaying our dashboards correctly:

How to relabel Kubernetes metrics with dynamic pod URLs in Prometheus?

I am trying to discover both pods and nodes on the same job using kubernetes_sd_configs and use their labels together.
I have blackbox-exporter on multiple pods in my cluster on different nodes, my goal is to monitor each of them, but I am having trouble with the metric relabelling.
I am trying to achieve something like this:
My current configuration looks like the following, but the pod URL is missing:
- job_name: 'kubernetes-pods'
metrics_path: /probe
module: [ping]
- role: node
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: <pod_should_be_here>:9115
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __param_target
- source_labels: [__param_target]
target_label: instance

Prometheus + Kubernetes metrics coming from wrong scrape job

I deployed prometheus server (+ kube state metrics + node exporter + alertmanager) through the prometheus helm chart using the chart's default values, including the chart's default scrape_configs. The problem is that I expect certain metrics to be coming from a particular job but instead are coming from a different one.
For example, node_cpu_seconds_total is being provided by the kubernetes-service-endpoints job but I expect it to come from the kubernetes-nodes job, i.e. node-exporter. The returned metric's values are accurate but the problem is I don't have the labels that would normally come from kubernetes-nodes (since kubernetes-nodes job has role: node vs role: endpoint for kubernetes-service-endpoints. I need these missing labels for advanced querying + dashboards.
Output of node_cpu_seconds_total{mode="idle"}:
node_cpu_seconds_total{app="prometheus",chart="prometheus-7.0.2",component="node-exporter",cpu="0",heritage="Tiller",instance="",job="kubernetes-service-endpoints",kubernetes_name="get-prometheus-node-exporter",kubernetes_namespace="default",mode="idle",release="get-prometheus"} | 423673.44
node_cpu_seconds_total{app="prometheus",chart="prometheus-7.0.2",component="node-exporter",cpu="0",heritage="Tiller",instance="",job="kubernetes-service-endpoints",kubernetes_name="get-prometheus-node-exporter",kubernetes_namespace="default",mode="idle",release="get-prometheus"} | 417097.16
There are no errors in the logs and I do have other kubernetes-nodes metrics such as up and storage_operation_errors_total so node-exporter is getting scraped.
I also verified manually that node-exporter has this particular metric, node_cpu_seconds_total, with curl <node IP>:9100/metrics | grep node_cpu and it has results.
Does the job order definition matter? Would one job override the other's metrics if they have the same name? Should I be dropping metrics for the kubernetes-service-endpoints job? I'm new to prometheus so any detailed help is appreciated.
I was able to figure out how to add the "missing" labels by navigating to the prometheus service-discovery status UI page. This page shows all the "Discovered Labels" that can be processed and kept through relabel_configs. What is processed/kept shows next to "Discovered Labels" under "Target Labels". So then it was just a matter of modifying the kubernetes-service-endpoints job config in scrape_configs so I add more taget labels. Below is exactly what I changed in the chart's scrape_configs. With this new config, I get namespace, service, pod, and node added to all metrics if the metric didn't already have them (see honor_labels).
- job_name: 'kubernetes-service-endpoints'
+ honor_labels: true
- role: endpoints
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
- target_label: kubernetes_namespace
+ target_label: namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
- target_label: kubernetes_name
+ target_label: service
+ - source_labels: [__meta_kubernetes_pod_name]
+ action: replace
+ target_label: pod
+ - source_labels: [__meta_kubernetes_pod_node_name]
+ action: replace
+ target_label: node
From the scrape configs, the kubernetes-nodes job probes https://kubernetes.default.svc:443/api/v1/nodes/${node_name}/proxy/metrics, while kubernetes-service-endpoints job probes every endpoints of those services with true defined, which includes node-exporter. So in your configs, the node_cpu_seconds_total metrics is definitely come from kuberenetes-service-endpoints job.

Prometheus: cannot export metrics from connected Kubernetes cluster

The issue: I have a Prometheus outside of Kubernetes cluster. So, I want to export metrics from remote cluster.
I took the config sample from Prometheus Github repo and modified this a little bit. So, here is my jobs config.
- job_name: 'kubernetes-apiservers'
scheme: http
- role: endpoints
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
insecure_skip_verify: true
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;http
- job_name: 'kubernetes-nodes'
scheme: http
- role: node
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
insecure_skip_verify: true
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-service-endpoints'
scheme: http
- role: endpoints
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
insecure_skip_verify: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (http?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-services'
scheme: http
metrics_path: /probe
module: [http_2xx]
- role: service
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
insecure_skip_verify: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_service_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
scheme: http
- role: pod
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
insecure_skip_verify: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
I don't use a TLS connection to API, so I want to disable it.
When I curl /metrics URL from Prometheus host - it prints them.
Finally I connected to the cluster, but...the jobs are not up and therefore Prometheus doesn't expose relabeled metrics.
What I see in Console.
Targets state:
Also I checked the Prometheus debug. It's thought the system gets any necessary information and requests are done successfully.
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"UDP\", \"__address__\":\"\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-local\"}, model.LabelSet{\"__address__\":\"\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp-local\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10055\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"\"}, model.LabelSet{\"__address__\":\"\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns\", \"__meta_kubernetes_pod_container_port_protocol\":\"UDP\"}, model.LabelSet{\"__address__\":\"\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_number\":\"10054\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq-metrics\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"\", \"__meta_kubernetes_pod_container_name\":\"healthz\", \"__meta_kubernetes_pod_container_port_number\":\"8080\", \"__meta_kubernetes_pod_container_port_name\":\"\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"kube-system\\\",\\\"name\\\":\\\"kube-dns-2924299975\\\",\\\"uid\\\":\\\"fa808d95-d7d9-11e6-9ac9-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"89\\\"}}\\n\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_affinity\":\"{\\\"nodeAffinity\\\":{\\\"requiredDuringSchedulingIgnoredDuringExecution\\\":{\\\"nodeSelectorTerms\\\":[{\\\"matchExpressions\\\":[{\\\"key\\\":\\\"\\\",\\\"operator\\\":\\\"In\\\",\\\"values\\\":[\\\"amd64\\\"]}]}]}}}\", \"__meta_kubernetes_pod_name\":\"kube-dns-2924299975-dksg5\", \"__meta_kubernetes_pod_ip\":\"\", \"__meta_kubernetes_pod_label_k8s_app\":\"kube-dns\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"2924299975\", \"__meta_kubernetes_pod_label_tier\":\"node\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_tolerations\":\"[{\\\"key\\\":\\\"dedicated\\\",\\\"value\\\":\\\"master\\\",\\\"effect\\\":\\\"NoSchedule\\\"}]\", \"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_pod_node_name\":\"\", \"__meta_kubernetes_pod_label_component\":\"kube-dns\", \"__meta_kubernetes_pod_label_kubernetes_io_cluster_service\":\"true\", \"__meta_kubernetes_pod_host_ip\":\"\", \"__meta_kubernetes_pod_label_name\":\"kube-dns\"}, Source:\"pod/kube-system/kube-dns-2924299975-dksg5\"}"
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"\", \"__meta_kubernetes_pod_container_name\":\"bot\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_host_ip\":\"\", \"__meta_kubernetes_pod_label_app\":\"bot\", \"__meta_kubernetes_namespace\":\"default\", \"__meta_kubernetes_pod_name\":\"bot-272181271-pnzsz\", \"__meta_kubernetes_pod_ip\":\"\", \"__meta_kubernetes_pod_node_name\":\"ip-172-17-101-25\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"default\\\",\\\"name\\\":\\\"bot-272181271\\\",\\\"uid\\\":\\\"c297b3c2-e15d-11e6-a28a-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1465127\\\"}}\\n\", \"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"272181271\", \"__meta_kubernetes_pod_label_version\":\"v0.1\"}, Source:\"pod/default/bot-272181271-pnzsz\"}"
Prometheus fetches updates, but...doesn't convert them to metrics.
So, I've broken my brain to figure out why is it going this way. So, please, help if you can figure out where might be mistake.
If you want to monitor a Kubernetes cluster from an external Prometheus server, I would suggest to set up a Prometheus federation topology:
Inside the K8s, install node-exporter pods and a Prometheus instance with short-term storage.
Expose the Prometheus service out of the K8s cluster, either via an ingress-controller (LB), or a node port. You can protect this endpoint with HTTPS + basic authentication.
Configure the center Prometheus to scrape metrics from above endpoint with proper authentication and tags.
This is the scalable solution. You can add monitor as many K8s clusters you want, until it reaches the capacities of the center Prometheus. Then you can add another center Prometheus instance to monitor others.
Finally I came to the though it's not trivial to setup Kubernetes cluster monitoring outside of cluster. Cause Kubernetes architecture suggested to keep all infrastructure within one local network. So, every workaround is going to be messy.
Also I came to the problem trying to debug why all configured jobs about Kubernetes roles such as nods, pods, services and endpoints doensn't even show up in targets status page. I may think wrong, but I didn't find out how to debug this issue in Prometheus.
My solution to monitor Kubernetes cluster outside was a kube-api-exporter. Pretty simple Python script which gets all metrics about ds, deployments and pods and finally provides the URL to fetch them. So, I'd recommend to come to this solution everyone who's got stuck with this sort of integration.
Also I started to fetch metrics from etcd. That's cool that etcd provides Prometheus-style metrics out of the box.
P.S.: thanks to FuzzyAmi for help.