I'm using Prometheus to scrape metrics from my pods. The application I'm interested in is replicated a couple of times with one service providing access. Prometheus uses this service to scrape the metrics. In my app the metrics are setup as follows:
import * as Prometheus from 'prom-client';
const httpRequestDurationMicroseconds = new Prometheus.Histogram({
name: 'transaction_amounts',
help: 'Amount',
labelNames: ['amount'],
buckets: [0, 5, 15, 50, 100, 200, 300, 400, 500, 10000],
});
const totalPayments = new Prometheus.Counter('transaction_totals', 'Total payments');
I'm using helm to install Prometheus and the scrape config looks like this:
prometheus.yml:
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: transactions
scrape_interval: 1s
static_configs:
- targets:
- transaction-metrics-service:3001
I can see the metrics inside prometheus, but it seems to be from just one pod. For example, in Prometheus, when I query for transaction_totals it gives:
I don't think that the instance label can uniquely identify my pods. What should I do to be able to query all pods?
Instead of using a static_config that scrapes just one host, try using kubernetes_sd_configs Kubernetes Service Discovery as provided by Prometheus.
Your config file would look something like this:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# only scrape when annotation prometheus.io/scrape: 'true' is set
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
and then add the annotation to your Kubernetes Deployment yaml config like this:
kind: Deployment
...
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "<< PORT OF YOUR CONTAINER >>"
You can see a full working example here.
add prometheus annotations to your service, since prom will only scrape a service that:
Exposes the exporter port
Has a prometheus.io/scrape: "true" annotation
Has a prometheus.io/port: "<exporter_port_here>" annotation
here is an official example
the scraped pod is probably prometheus itself
Related
we recently decided to install loki and promtail via the loki-stack helm chart. Loki and promtail kind of work. We get some logs from Promtail and we can visualize them in grafana but our development namespace is nowhere to be found in loki. Promtail shows the development pod as an active target and promtail already collected the logs from the pod but we cant seem to get them into loki somehow... Any ideas?
tl;dr
set loki.monitoring.selfMonitoring.grafanaAgent.installOperator to false
This problem is caused by grafana-agent which is installed by default as a sub-chart of grafana/loki chart...
agent creates secret 'loki-logs-config' (loki in this case is Helm release name) which contains following configuration:
agent.yml: |+
logs:
configs:
- clients:
- external_labels:
cluster: loki
url: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
name: monitoring/loki
scrape_configs:
- job_name: podLogs/monitoring/loki
kubernetes_sd_configs:
- namespaces:
names:
- monitoring
role: pod
pipeline_stages:
- cri: {}
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- replacement: monitoring/loki
target_label: job
- replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
replacement: monitoring/$1
source_labels:
- __meta_kubernetes_pod_controller_name
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: loki
target_label: cluster
positions_directory: /var/lib/grafana-agent/data
server: {}
As you can see under kubernetes_sd_configs there is namespaces list with value of monitoring - I have no idea why is it there, but that's the namespace I've installed this chart into.
You won't see this secret after executing helm template - it seems that Grafana Agent creates it somehow after startup.
It has label app.kubernetes.io/managed-by=grafana-agent-operator
Pretty magical if you ask me...
The solution for me was disabling disabling installation of Grafana Agent:
loki:
loki:
commonConfig:
replication_factor: 1
storage:
type: 'filesystem'
auth_enabled: false
monitoring:
dashboards:
enabled: false
selfMonitoring:
enabled: true
grafanaAgent:
installOperator: false
lokiCanary:
enabled: false
Note: top loki element in the code block above is needed only if you add grafana/loki chart as subchart to your chart
IMO enabling beta feature (Grafana Agent is v0.30.0 today) in a Chart used as a reference in Loki's doc is insane :)
I have seamlessly been using kube-prometheus-stack for monitoring.
after realizing there is metrics: section in values.yaml file, I wanted to enable metrics for both existing deployments;
postgresql (https://github.com/bitnami/charts/tree/master/bitnami/postgresql)
cassandra (https://github.com/bitnami/charts/tree/master/bitnami/cassandra)
for cassandra deployment
altering the value of key enabled: from false to true was sufficient. After upgrading helm release with new values, a sidecar container is created. I confirmed that I do see the metrics displayed at /metrics. I see targets are listed and being scraped in Prometheus.
metrics:
enabled: true
but for postgresql deployment
doing the same did not work; setting
metrics:
enabled: true
resulted in creation of a sidecar container, metrics are displayable at /metrics. But it is not listed nor being scraped in Prometheus.
so my question is: why is this same setting giving the desired result for cassandra deployment but not for postgresql? am I missing something and what else do I need to check?
Also, I don't need to enable serviceMonitor for these deployments(?) because prometheus can scrape pods based on annotations, right?
Any help is appreciated.
annotation config
additionalScrapeConfigs in values.yaml file of kube-prometheus-stack is edited to work with prometheus.io/* annotations. (ref: Monitor custom kubernetes pod metrics using Prometheus)
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_scrape ]
action: keep
regex: true
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_path ]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [ __address__, __meta_kubernetes_pod_annotation_prometheus_io_port ]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [ __meta_kubernetes_namespace ]
action: replace
target_label: kubernetes_namespace
- source_labels: [ __meta_kubernetes_pod_name ]
action: replace
target_label: kubernetes_pod_name
versions
kubectl version:
v1.22.6
chart versions:
kube-prometheus-stack-35.0.3
postgresql-11.1.28
cassandra-9.1.9
Our application is deployed in the istio service mesh and we are trying to scrape metrics at container level using the prometheus.io annotations.
So we have enabled spring boot metrics in our application and we are able to fetch the metrics on the given path '/manage/prometheus'.
We have enabled Prometheus annotations in the deployment file of our application as follows:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
prometheus.io/path: '/manage/prometheus'
This works fine when there is a single container in the pod.
But for pods that have multiple containers, we are unable to scrape the metrics with the container port.
Following are the workarounds we tried:
Following the reference https://gist.github.com/bakins/5bf7d4e719f36c1c555d81134d8887eb we tried to add the relabel configs for scraping data at container level:
prometheus-config.yaml
scrape-configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: (.*)
- source_labels: [ __address__, __meta_kubernetes_pod_container_port_number]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
- action: drop
regex: Pending|Succeeded|Failed
source_labels:
- __meta_kubernetes_pod_phase
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
source_labels:
- __meta_kubernetes_pod_container_port_number
target_label: container_port
But after applying the above configuration we are getting the error as:
Get "http://10.x.x.x:8080/stats/prometheus": read tcp 10.y.y.y:45542->10.x.x.x:8080: read: connection reset by peer
So 10.x.x.x is the pod IP and 8080 is the container port, it is not able to scrape using the container port.
We tried the above configuration after removing the istio mesh i.e. by removing the istio sidecar from all the microservices pods and we could see container level metrics being scraped.
Istio’s proxy is somewhere blocking the metrics to be scraped at the container level.
Have anyone faced this similar issue?
I installed Prometheus on my Kubernetes cluster with Helm, using the community chart kube-prometheus-stack - and I get some beautiful dashboards in the bundled Grafana instance. I now wanted the recommender from the Vertical Pod Autoscaler to use Prometheus as a data source for historic metrics, as described here. Meaning, I had to make a change to the Prometheus scraper settings for cAdvisor, and this answer pointed me in the right direction, as after making that change I can now see the correct job tag on metrics from cAdvisor.
Unfortunately, now some of the charts in the Grafana dashboards are broken. It looks like it no longer picks up the CPU metrics - and instead just displays "No data" for the CPU-related charts.
So, I assume I have to tweak the charts to be able to pick up the metrics correctly again, but I don't see any obvious places to do this in Grafana?
Not sure if it is relevant for the question, but I am running my Kubernetes cluster on Azure Kubernetes Service (AKS).
This is the full values.yaml I supply to the Helm chart when installing Prometheus:
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
kubeEtcd:
enabled: false
kubeProxy:
enabled: false
kubelet:
serviceMonitor:
# Diables the normal cAdvisor scraping, as we add it with the job name "kubernetes-cadvisor" under additionalScrapeConfigs
# The reason for doing this is to enable the VPA to use the metrics for the recommender
# https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md#how-can-i-use-prometheus-as-a-history-provider-for-the-vpa-recommender
cAdvisor: false
prometheus:
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
# the azurefile storage class is created automatically on AKS
storageClassName: azurefile
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 50Gi
additionalScrapeConfigs:
- job_name: 'kubernetes-cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
Kubernetes version: 1.21.2
kube-prometheus-stack version: 18.1.1
helm version: version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"dirty", GoVersion:"go1.16.5"}
Unfortunately, I don't have access to Azure AKS, so I've reproduced this issue on my GKE cluster. Below I'll provide some explanations that may help to resolve your problem.
First you can try to execute this node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate rule to see if it returns any result:
If it doesn't return any records, please read the following paragraphs.
Creating a scrape configuration for cAdvisor
Rather than creating a completely new scrape configuration for cadvisor, I would suggest using one that is generated by default when kubelet.serviceMonitor.cAdvisor: true, but with a few modifications such as changing the label to job=kubernetes-cadvisor.
In my example, the 'kubernetes-cadvisor' scrape configuration looks like this:
NOTE: I added this config under the additionalScrapeConfigs in the values.yaml file (the rest of the values.yaml file may be like yours).
- job_name: 'kubernetes-cadvisor'
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
namespaces:
names:
- kube-system
Modifying Prometheus Rules
By default, Prometheus rules fetching data from cAdvisor use job="kubelet" in their PromQL expressions:
After changing job=kubelet to job=kubernetes-cadvisor, we also need to modify this label in the Prometheus rules:
NOTE: We just need to modify the rules that have metrics_path="/metrics/cadvisor (these are rules that retrieve data from cAdvisor).
$ kubectl get prometheusrules prom-1-kube-prometheus-sta-k8s.rules -o yaml
...
- name: k8s.rules
rules:
- expr: |-
sum by (cluster, namespace, pod, container) (
irate(container_cpu_usage_seconds_total{job="kubernetes-cadvisor", metrics_path="/metrics/cadvisor", image!=""}[5m])
) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
...
here we have a few more rules to modify...
After modifying Prometheus rules and waiting some time, we can see if it works as expected. We can try to execute node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate as in the beginning.
Additionally, let's check out our Grafana to make sure it has started displaying our dashboards correctly:
I am running prometheus on a kubernetes cluster and trying to scrape pods, nodes, services. I am getting the following error when i reload the config by sending POST request-
failed to reload config: couldn't load configuration (-config.file=/etc/prometheus/conf/prometheus.yml): unknown fields in kubernetes_sd_config: api_server
While trying to follow official docs for writing config file, I am not able to understand the relabel_configs, source_labels, target_labels, action, keep, regex part. Can somebody explain these parts and also the use of labels in prometheus. Thanks in advance.
Following is the prometheus.yml file-
scrape_configs:
- job_name: 'kubernetes-nodes'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
Your yaml file is off, try this:
- job_name: 'kubernetes-services'
...
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
role: service
...
This is the working Prometheus Configmap example file, fwiw.
https://github.com/kayrus/prometheus-kubernetes/blob/master/prometheus-configmap.yaml#L214-L241
I found that to reduce the noise of what kubectl thinks it is doing to use yamllint. If you get the config map with options yaml; when reading that file back in the kubectl command puts all the sections that are meant for data inside the data: section and it should know to ignore the other 3 sections (apiVersion, kind, and metadata)
So make sure to have only the data: section when/if you load it as a new config map.
apiVersion: v1
data:
kind: ConfigMap
metadata:
Command to get the config map
kubectl get configmap prometheus-config --namespace prometheus -o yaml > prometheus.yml
Take out all the excess comments and extra blank lines in both files (yours and the example) to save it as prometheus[#].yml then get yamllint and run it on the file(s)
yamllint -d relaxed prometheus[#].yml
Most of the time yamllint will complain lines are > 80 characters long.
If it is a JSON syntax issue then it will show up quickly.
HTH