I am currently running metrics server inside my pod. The data is being sent inside pod at localhost:9090. I am able to get the data inside pod via curl. The deployment.yaml is annotated to scrape the data, but I don't see any new metrics in pod. What am I doing wrong?
metrics I see inside pod :
cpu_usage{process="COMMAND", pid="PID"} %CPU
cpu_usage{process="/bin/sh", pid="1"} 0.0
cpu_usage{process="sh", pid="8"} 0.0
cpu_usage{process="/usr/share/filebeat/bin/filebeat-god", pid="49"} 0.0
cpu_usage{process="/usr/share/filebeat/bin/filebeat", pid="52"} 0.0
cpu_usage{process="php-fpm:", pid="66"} 0.0
cpu_usage{process="php-fpm:", pid="67"} 0.0
cpu_usage{process="php-fpm:", pid="68"} 0.0
cpu_usage{process="nginx:", pid="69"} 0.0
cpu_usage{process="nginx:", pid="70"} 0.0
cpu_usage{process="nginx:", pid="71"} 0.0
cpu_usage{process="/bin/sh", pid="541"} 0.0
cpu_usage{process="bash", pid="556"} 0.0
cpu_usage{process="/bin/sh", pid="1992"} 0.0
cpu_usage{process="ps", pid="1993"} 0.0
cpu_usage{process="/bin/sh", pid="1994"} 0.0
deployment.yaml
template:
metadata:
labels:
app: supplier-service
annotations:
prometheus.io/path: /
prometheus.io/scrape: 'true'
prometheus.io/port: '9090'
ports:
- containerPort: 80
- containerPort: 443
- containerPort: 9090
prometheus.yml
global:
scrape_interval: 15s # By default, scrape targets every 15seconds. # Attach these labels to any time series or alerts when #communicating with external systems (federation, re$
external_labels:
monitor: 'codelab-monitor'
# Scraping Prometheus itself
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes-service-endpoints'
scrape_interval: 5s
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__address__]
action: replace
regex: ([^:]+)(?::\d+)?
replacement: $1:9090
target_label: __address__
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
Port numbers are correct. What am I doing wrong?
Your kubernetes_sd_configs is configured to look for endpoints, which are created by services. Do you have an endpoint created for your service? You could check with kubectl get endpoints in your namespace. If you don't want to create a service, I suppose you could configure Prometheus to scrape pod targets too, check the docs for more info.
Also the documentation for metrics and labels say the metric name must match the regular expression [a-zA-Z_:][a-zA-Z0-9_:]*, so the dash (-) in your metric name might be an issue too.
Related
we recently decided to install loki and promtail via the loki-stack helm chart. Loki and promtail kind of work. We get some logs from Promtail and we can visualize them in grafana but our development namespace is nowhere to be found in loki. Promtail shows the development pod as an active target and promtail already collected the logs from the pod but we cant seem to get them into loki somehow... Any ideas?
tl;dr
set loki.monitoring.selfMonitoring.grafanaAgent.installOperator to false
This problem is caused by grafana-agent which is installed by default as a sub-chart of grafana/loki chart...
agent creates secret 'loki-logs-config' (loki in this case is Helm release name) which contains following configuration:
agent.yml: |+
logs:
configs:
- clients:
- external_labels:
cluster: loki
url: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
name: monitoring/loki
scrape_configs:
- job_name: podLogs/monitoring/loki
kubernetes_sd_configs:
- namespaces:
names:
- monitoring
role: pod
pipeline_stages:
- cri: {}
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- replacement: monitoring/loki
target_label: job
- replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
replacement: monitoring/$1
source_labels:
- __meta_kubernetes_pod_controller_name
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: loki
target_label: cluster
positions_directory: /var/lib/grafana-agent/data
server: {}
As you can see under kubernetes_sd_configs there is namespaces list with value of monitoring - I have no idea why is it there, but that's the namespace I've installed this chart into.
You won't see this secret after executing helm template - it seems that Grafana Agent creates it somehow after startup.
It has label app.kubernetes.io/managed-by=grafana-agent-operator
Pretty magical if you ask me...
The solution for me was disabling disabling installation of Grafana Agent:
loki:
loki:
commonConfig:
replication_factor: 1
storage:
type: 'filesystem'
auth_enabled: false
monitoring:
dashboards:
enabled: false
selfMonitoring:
enabled: true
grafanaAgent:
installOperator: false
lokiCanary:
enabled: false
Note: top loki element in the code block above is needed only if you add grafana/loki chart as subchart to your chart
IMO enabling beta feature (Grafana Agent is v0.30.0 today) in a Chart used as a reference in Loki's doc is insane :)
Hey I'm currently trying to determine uptime of a pod with kube state metrics, specifically when a pod has started or stopped. I am using a Prometheus Deployment with Kube State metrics in order to determine when a pod has been started and stopped.
Specifically I want to get the following metrics:
kube_pod_completion_time
kube_pod_created
As a test I've configured Prometheus to gather metrics with the following config.yml file:
global:
scrape_interval: 10m
scrape_timeout: 10s
evaluation_interval: 10m
scrape_configs:
- job_name: kubernetes-nodes-cadvisor
honor_timestamps: true
scrape_interval: 10m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
enable_http2: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
action: replace
metric_relabel_configs:
- source_labels: [__name__]
regex: '(container_cpu_usage_seconds_total|container_fs_reads_bytes_total|container_fs_writes_bytes_total|container_memory_max_usage_bytes|container_network_receive_bytes_total|container_network_transmit_bytes_total)'
action: keep
kubernetes_sd_configs:
- role: node
kubeconfig_file: ''
follow_redirects: true
enable_http2: true
- job_name: 'kube-state-metrics'
scrape_interval: 10m
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
metric_relabel_configs:
- source_labels: [__name__]
regex: '(kube_pod_labels|kube_pod_created|kube_pod_completion_time|kube_pod_container_resource_limits)'
action: keep
remote_write:
- url: http://example.com
remote_timeout: 30s
follow_redirects: true
enable_http2: true
oauth2:
token_url: https://example.com
client_id: myCoolID
client_secret: myCoolPassword
queue_config:
capacity: 2500
max_shards: 200
min_shards: 1
max_samples_per_send: 10
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
metadata_config:
send: false
Additionally I also have the following test pod deployment running:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busy-box-test
spec:
replicas: 1
selector:
matchLabels:
app: busy-box-test
template:
metadata:
labels:
app: busy-box-test
spec:
containers:
- command:
- sleep
- '300'
image: busybox
name: test-box
However when I go to search for metrics regarding kube_pod_completion_time I cannot find any in my remote write source, while I do have all the other metrics specified in the regex. (kube_pod_labels|kube_pod_created ... kube_pod_container_resource_limits)
Additionally I've tried the following command to see if they are present in the cluster:
kubectl get --raw '/metrics' | grep kube_ and kubectl get --raw 'kube-state-metrics.kube-system.svc.cluster.local:8080' but I don't find anything definitive. I suspect the command is looking in the wrong location
So beyond if I am missing something obvious I missed I have the following open questions:
Is there an endpoint I should hit inside the cluster which should return the completion time? Is there an issue with the polling interval being once every 10 minutes for a pod that comes up and down every 5? (If anyone knows how long a terminated history will stick around in kube state metrics that would be great to know as well)
I've included the configuration for kube state metrics here: https://gist.github.com/twosdai/12607c8459bdb73fc98edbbcb17b5eb5 in order to keep the post a bit more concise. The cluster is running in AWS EKS Version: 1.22
i am trying to get logs from a single namespace through promtail and scrape_configs, but i am not getting results. I am installing in k8s with
helm install loki grafana/loki-stack -n loki-test -f
~/loki-stack-values.yml
and the contents of my values file are:
loki:
enabled: true
promtail:
enabled: true
pipelineStages:
- cri: {}
- json:
expressions:
is_even: is_even
level: level
version: version
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: mongodb-test
# [...]
- job_name: kubernetes-pods-app
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: mongodb-test
grafana:
enabled: true
sidecar:
datasources:
enabled: true
image:
tag: 8.3.5
My expectation was that i will only get logs from the mongodb-test namespace, but i can view from any namespace present.
Also tried with drop, but it did not do anything.
What should i do here?
Thank you so much
Using match statement to drop other namespaces under pipeline stages worked for me. In your case,
config:
snippets:
pipelineStages:
- cri: {}
- match:
selector: '{namespace!~"mongodb-test"}'
action: drop
common:
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
I'm using Prometheus to scrape metrics from my pods. The application I'm interested in is replicated a couple of times with one service providing access. Prometheus uses this service to scrape the metrics. In my app the metrics are setup as follows:
import * as Prometheus from 'prom-client';
const httpRequestDurationMicroseconds = new Prometheus.Histogram({
name: 'transaction_amounts',
help: 'Amount',
labelNames: ['amount'],
buckets: [0, 5, 15, 50, 100, 200, 300, 400, 500, 10000],
});
const totalPayments = new Prometheus.Counter('transaction_totals', 'Total payments');
I'm using helm to install Prometheus and the scrape config looks like this:
prometheus.yml:
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: transactions
scrape_interval: 1s
static_configs:
- targets:
- transaction-metrics-service:3001
I can see the metrics inside prometheus, but it seems to be from just one pod. For example, in Prometheus, when I query for transaction_totals it gives:
I don't think that the instance label can uniquely identify my pods. What should I do to be able to query all pods?
Instead of using a static_config that scrapes just one host, try using kubernetes_sd_configs Kubernetes Service Discovery as provided by Prometheus.
Your config file would look something like this:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# only scrape when annotation prometheus.io/scrape: 'true' is set
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
and then add the annotation to your Kubernetes Deployment yaml config like this:
kind: Deployment
...
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "<< PORT OF YOUR CONTAINER >>"
You can see a full working example here.
add prometheus annotations to your service, since prom will only scrape a service that:
Exposes the exporter port
Has a prometheus.io/scrape: "true" annotation
Has a prometheus.io/port: "<exporter_port_here>" annotation
here is an official example
the scraped pod is probably prometheus itself
I am running prometheus on a kubernetes cluster and trying to scrape pods, nodes, services. I am getting the following error when i reload the config by sending POST request-
failed to reload config: couldn't load configuration (-config.file=/etc/prometheus/conf/prometheus.yml): unknown fields in kubernetes_sd_config: api_server
While trying to follow official docs for writing config file, I am not able to understand the relabel_configs, source_labels, target_labels, action, keep, regex part. Can somebody explain these parts and also the use of labels in prometheus. Thanks in advance.
Following is the prometheus.yml file-
scrape_configs:
- job_name: 'kubernetes-nodes'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
Your yaml file is off, try this:
- job_name: 'kubernetes-services'
...
kubernetes_sd_configs:
- api_server: "https://kubernetes.default.svc"
role: service
...
This is the working Prometheus Configmap example file, fwiw.
https://github.com/kayrus/prometheus-kubernetes/blob/master/prometheus-configmap.yaml#L214-L241
I found that to reduce the noise of what kubectl thinks it is doing to use yamllint. If you get the config map with options yaml; when reading that file back in the kubectl command puts all the sections that are meant for data inside the data: section and it should know to ignore the other 3 sections (apiVersion, kind, and metadata)
So make sure to have only the data: section when/if you load it as a new config map.
apiVersion: v1
data:
kind: ConfigMap
metadata:
Command to get the config map
kubectl get configmap prometheus-config --namespace prometheus -o yaml > prometheus.yml
Take out all the excess comments and extra blank lines in both files (yours and the example) to save it as prometheus[#].yml then get yamllint and run it on the file(s)
yamllint -d relaxed prometheus[#].yml
Most of the time yamllint will complain lines are > 80 characters long.
If it is a JSON syntax issue then it will show up quickly.
HTH