Kubernetes change __meta_kubernetes_namespace - kubernetes

I am trying to monitor my cluster Kubernetes and I am using prometheus to get all informations.
It's working perfectly but, I need to monitor some specific workers and I need to label it using __meta_kubernetes_namespace, but could not find any reference explaining how to change it in the kubnernetes Environment.
Please help me to resolve it,
Thank you.

You would need to write your own scrape config for Prometheus.
This is a partial snipit of istio envoy proxies stats, you should write your own based on what resources to monitor and where to look for them:
- job_name: 'envoy-stats'
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: '.*-envoy-prom'
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
metric_relabel_configs:
# Exclude some of the envoy metrics that have massive cardinality
# This list may need to be pruned further moving forward, as informed
# by performance and scalability testing.
- source_labels: [ __name__ ]
regex: 'envoy_http_(stats|admin).*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_cluster_(lb|retry|bind|internal|max|original).*'
action: drop
Also please go over first steps with prometheus as this will be helpful in working with Prometheus.

Related

How to scrape metrics from postgresql helm deployment with kube-prometheus-stack

I have seamlessly been using kube-prometheus-stack for monitoring.
after realizing there is metrics: section in values.yaml file, I wanted to enable metrics for both existing deployments;
postgresql (https://github.com/bitnami/charts/tree/master/bitnami/postgresql)
cassandra (https://github.com/bitnami/charts/tree/master/bitnami/cassandra)
for cassandra deployment
altering the value of key enabled: from false to true was sufficient. After upgrading helm release with new values, a sidecar container is created. I confirmed that I do see the metrics displayed at /metrics. I see targets are listed and being scraped in Prometheus.
metrics:
enabled: true
but for postgresql deployment
doing the same did not work; setting
metrics:
enabled: true
resulted in creation of a sidecar container, metrics are displayable at /metrics. But it is not listed nor being scraped in Prometheus.
so my question is: why is this same setting giving the desired result for cassandra deployment but not for postgresql? am I missing something and what else do I need to check?
Also, I don't need to enable serviceMonitor for these deployments(?) because prometheus can scrape pods based on annotations, right?
Any help is appreciated.
annotation config
additionalScrapeConfigs in values.yaml file of kube-prometheus-stack is edited to work with prometheus.io/* annotations. (ref: Monitor custom kubernetes pod metrics using Prometheus)
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_scrape ]
action: keep
regex: true
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_path ]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [ __address__, __meta_kubernetes_pod_annotation_prometheus_io_port ]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [ __meta_kubernetes_namespace ]
action: replace
target_label: kubernetes_namespace
- source_labels: [ __meta_kubernetes_pod_name ]
action: replace
target_label: kubernetes_pod_name
versions
kubectl version:
v1.22.6
chart versions:
kube-prometheus-stack-35.0.3
postgresql-11.1.28
cassandra-9.1.9

Scape metrics from multiple containers in prometheus with Istio

Our application is deployed in the istio service mesh and we are trying to scrape metrics at container level using the prometheus.io annotations.
So we have enabled spring boot metrics in our application and we are able to fetch the metrics on the given path '/manage/prometheus'.
We have enabled Prometheus annotations in the deployment file of our application as follows:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
prometheus.io/path: '/manage/prometheus'
This works fine when there is a single container in the pod.
But for pods that have multiple containers, we are unable to scrape the metrics with the container port.
Following are the workarounds we tried:
Following the reference https://gist.github.com/bakins/5bf7d4e719f36c1c555d81134d8887eb we tried to add the relabel configs for scraping data at container level:
prometheus-config.yaml
scrape-configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: (.*)
- source_labels: [ __address__, __meta_kubernetes_pod_container_port_number]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
- action: drop
regex: Pending|Succeeded|Failed
source_labels:
- __meta_kubernetes_pod_phase
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
source_labels:
- __meta_kubernetes_pod_container_port_number
target_label: container_port
But after applying the above configuration we are getting the error as:
Get "http://10.x.x.x:8080/stats/prometheus": read tcp 10.y.y.y:45542->10.x.x.x:8080: read: connection reset by peer
So 10.x.x.x is the pod IP and 8080 is the container port, it is not able to scrape using the container port.
We tried the above configuration after removing the istio mesh i.e. by removing the istio sidecar from all the microservices pods and we could see container level metrics being scraped.
Istio’s proxy is somewhere blocking the metrics to be scraped at the container level.
Have anyone faced this similar issue?

Drop or rename scrape_config label

I'd love to rename or drop a label from a /metrics endpoint within my metric. The metric itself is from the kube-state-metrics application, so nothing extraordinary. The metric looks like this:
kube_pod_container_resource_requests{container="alertmanager", instance="10.10.10.10:8080", funday_monday="blubb", job="some-kube-state-metrics", name="kube-state-metrics", namespace="monitoring", node="some-host-in-azure-1234", pod="alertmanager-main-1", resource="memory", uid="12345678-1234-1234-1234-123456789012", unit="byte"} 209715200
The label I'd love to replace is instance because it refers to the host which runs the kube-state-metrics application and I don't care about that. I want to have the value of node in instance and I've been trying so for hours now and can't find a way - I wonder if it's not possible at all!?
The way I'm grabbing the /metrics endpoint is through the means of a scrape-config which looks as follows:
- job_name: some-kube-state-metrics
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
kubernetes_sd_configs:
- api_server: null
role: pod
scheme: http
relabel_configs:
- source_labels: [__meta_kubernetes_pod_labelpresent_kubeStateMetrics]
regex: true
action: keep
- source_labels: [__meta_kubernetes_pod_label_name]
regex: (.*)
replacement: $1
target_label: name
action: replace
- source_labels: [__meta_kubernetes_pod_container_port_name]
separator: ;
regex: http
replacement: $1
action: keep
- source_labels: [node]
regex: (.*)
replacement: blubb
target_label: funday_monday
action: replace
- action: labeldrop
regex: "unit=(.*)"
- source_labels: [ __name__ ]
regex: 'kube\_pod\_container\_resource\_requests'
action: drop
As you can tell, I've been trying to drop labels as well, namely the unit-label (just for testing purposes) and I also tried to drop the metrics all together.
The funday_monday is an example that changed because I wanted to know if static relabels are possible (it works!) - before it looked like this:
- source_labels: [node]
regex: (.*)
replacement: $1
target_label: funday_monday
action: replace
Help is appreciated.
The problem is that you are doing those operations at the wrong time. relabel_configs happens before metrics are actually gathered, so, at this time, you can only manipulate the labels that you got from service discovery.
That node label comes from the exporter. Therefore, you need to do this relabeling action under metric_relabel_configs:
metric_relabel_configs:
- source_labels: [node]
target_label: instance
Same goes for dropping metrics. If you wish a bit more info, I answered a similar question here: prometheus relabel_config drop action not working

How to relabel Kubernetes metrics with dynamic pod URLs in Prometheus?

I am trying to discover both pods and nodes on the same job using kubernetes_sd_configs and use their labels together.
I have blackbox-exporter on multiple pods in my cluster on different nodes, my goal is to monitor each of them, but I am having trouble with the metric relabelling.
I am trying to achieve something like this:
http://<pod-ip1>:8082/metrics?instance=<node-ip1>
http://<pod-ip2>:8082/metrics?instance=<node-ip1>
http://<pod-ipN>:8082/metrics?instance=<node-ip1>
.
.
http://<pod-ip1>:8082/metrics?instance=<node-ip2>
http://<pod-ip2>:8082/metrics?instance=<node-ip2>
http://<pod-ipN>:8082/metrics?instance=<node-ip2>
.
.
My current configuration looks like the following, but the pod URL is missing:
- job_name: 'kubernetes-pods'
metrics_path: /probe
params:
module: [ping]
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: <pod_should_be_here>:9115
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __param_target
- source_labels: [__param_target]
target_label: instance

Prometheus + Kubernetes metrics coming from wrong scrape job

I deployed prometheus server (+ kube state metrics + node exporter + alertmanager) through the prometheus helm chart using the chart's default values, including the chart's default scrape_configs. The problem is that I expect certain metrics to be coming from a particular job but instead are coming from a different one.
For example, node_cpu_seconds_total is being provided by the kubernetes-service-endpoints job but I expect it to come from the kubernetes-nodes job, i.e. node-exporter. The returned metric's values are accurate but the problem is I don't have the labels that would normally come from kubernetes-nodes (since kubernetes-nodes job has role: node vs role: endpoint for kubernetes-service-endpoints. I need these missing labels for advanced querying + dashboards.
Output of node_cpu_seconds_total{mode="idle"}:
node_cpu_seconds_total{app="prometheus",chart="prometheus-7.0.2",component="node-exporter",cpu="0",heritage="Tiller",instance="10.80.20.46:9100",job="kubernetes-service-endpoints",kubernetes_name="get-prometheus-node-exporter",kubernetes_namespace="default",mode="idle",release="get-prometheus"} | 423673.44
node_cpu_seconds_total{app="prometheus",chart="prometheus-7.0.2",component="node-exporter",cpu="0",heritage="Tiller",instance="10.80.20.52:9100",job="kubernetes-service-endpoints",kubernetes_name="get-prometheus-node-exporter",kubernetes_namespace="default",mode="idle",release="get-prometheus"} | 417097.16
There are no errors in the logs and I do have other kubernetes-nodes metrics such as up and storage_operation_errors_total so node-exporter is getting scraped.
I also verified manually that node-exporter has this particular metric, node_cpu_seconds_total, with curl <node IP>:9100/metrics | grep node_cpu and it has results.
Does the job order definition matter? Would one job override the other's metrics if they have the same name? Should I be dropping metrics for the kubernetes-service-endpoints job? I'm new to prometheus so any detailed help is appreciated.
I was able to figure out how to add the "missing" labels by navigating to the prometheus service-discovery status UI page. This page shows all the "Discovered Labels" that can be processed and kept through relabel_configs. What is processed/kept shows next to "Discovered Labels" under "Target Labels". So then it was just a matter of modifying the kubernetes-service-endpoints job config in scrape_configs so I add more taget labels. Below is exactly what I changed in the chart's scrape_configs. With this new config, I get namespace, service, pod, and node added to all metrics if the metric didn't already have them (see honor_labels).
- job_name: 'kubernetes-service-endpoints'
+ honor_labels: true
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
- target_label: kubernetes_namespace
+ target_label: namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
- target_label: kubernetes_name
+ target_label: service
+ - source_labels: [__meta_kubernetes_pod_name]
+ action: replace
+ target_label: pod
+ - source_labels: [__meta_kubernetes_pod_node_name]
+ action: replace
+ target_label: node
From the scrape configs, the kubernetes-nodes job probes https://kubernetes.default.svc:443/api/v1/nodes/${node_name}/proxy/metrics, while kubernetes-service-endpoints job probes every endpoints of those services with prometheus.io/scrape: true defined, which includes node-exporter. So in your configs, the node_cpu_seconds_total metrics is definitely come from kuberenetes-service-endpoints job.