failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:monitoring" cannot list resource "pods" in API group "" at the cluster scope - kubernetes

Not sure what I am missing. Please, find below all the config scripts I have used
2022-07-21T07:26:56.903Z info service/collector.go:220 Starting otelcol... {"service": "my-prom-instance", "Version": "0.54.0", "NumCPU": 4}
2022-07-21T07:26:56.903Z info service/collector.go:128 Everything is ready. Begin running and processing data. {"service": "my-prom-instance"}
2022-07-21T07:26:56.902Z debug discovery/manager.go:309 Discoverer channel closed {"service": "my-prom-instance", "kind": "receiver", "name": "prometheus", "pipeline": "metrics", "provider": "static/0"}
W0721 07:26:56.964183 1 reflector.go:324] k8s.io/client-go#v0.24.2/tools/cache/reflector.go:167: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:monitoring:otel-collector-collector" cannot list resource "pods" in API group "" at the cluster scope
E0721 07:26:56.964871 1 reflector.go:138] k8s.io/client-go#v0.24.2/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:monitoring:otel-collector-collector" cannot list resource "pods" in API group "" at the cluster scope
W0721 07:26:58.435237 1 reflector.go:324] k8s.io/client-go#v0.24.2/tools/cache/reflector.go:167: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:monitoring:otel-collector-collector" cannot list resource "pods" in API group "" at the cluster scope
E0721 07:26:58.435924 1 reflector.go:138]
clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
namespace: monitoring
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: default
namespace: monitoring
config-map.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: monitoring
data:
prometheus.rules: |-
groups:
- name: devopscube demo alert
rules:
- alert: High Pod Memory
expr: sum(container_memory_usage_bytes) > 1
for: 1m
labels:
severity: slack
annotations:
summary: High Memory Usage
prometheus.yml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
rule_files:
- /etc/prometheus/prometheus.rules
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager.monitoring.svc:9093"
scrape_configs:
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: 'node-exporter'
action: keep
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
namespace: monitoring
labels:
app: prometheus-server
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-server
template:
metadata:
labels:
app: prometheus-server
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/
- name: prometheus-storage-volume
mountPath: /prometheus/
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-server-conf
- name: prometheus-storage-volume
emptyDir: {}
otel-deployment.yaml
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel-collector
namespace: monitoring
spec:
config: |
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'kube-state-metrics'
scrape_interval: 5s
scrape_timeout: 1s
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
- job_name: k8s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: "true"
action: keep
metric_relabel_configs:
- source_labels: [__name__]
regex: "(request_duration_seconds.*|response_duration_seconds.*)"
action: keep
processors:
batch:
exporters:
logging:
service:
pipelines:
metrics:
receivers: [prometheus]
exporters: [logging]
telemetry:
logs:
level: debug
initial_fields:
service: my-prom-instance
otel-service.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: otel-collector-sa
namespace: monitoring

the service account is defined with name otel-collector-sa, and your ClusterRoleBinding link to service account default

Related

How to add custom labels in Promtail Config

I need to Extract logs data and append as a new label, below is the sample log example:
Sample Log Message:
2022-12-21T11:48:00,001 [schedulerFactor_Worker-4, , ] INFO [,,] [userAgent=] [system=,component=,object=] [,] [] c.s.f.s.scheduler.SchedulerTask - sync process started on 2022-12-21T06:48:00.000780 for sync pair :17743b1b-a067-4478-a6d8-7b1cff04175a for JobId :dc8dc0dd-fdb9-4873-af55-1c70ba2047a5
New Labels needed in logs:
sync_pair =17743b1b-a067-4478-a6d8-7b1cff04175a
JobId =dc8dc0dd-fdb9-4873-af55-1c70ba2047a5
My promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
http_listen_address: 0.0.0.0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
- job_name: containers
static_configs:
- targets:
- localhost
labels:
job: containers
__path__: /var/lib/docker/containers/*/*log
- job_name: dockerlogs
file_sd_configs:
- files:
- /etc/promtail/promtail-targets.yml
relabel_configs:
- source_labels: [job]
target_label: job
- source_labels: [__address__]
target_label: container_id
- source_labels: [container_id]
target_label: __path__
replacement: /var/lib/docker/containers/*/*log
pipeline_stages:
- match:
selector: '{job="varlogs"}'
stages:
- regex:
expression: '(?P<sync_pair>sync_pair)' '(?P<job_id>job_id)'
- labels:
sync_pair:
job_id:
I have added pipeline stages, but it's not showing any labels. i.e. sync_pair and JobId, this labels should be shown in logs after query.
2022-12-21T11:48:00,001 [schedulerFactor_Worker-4, , ] INFO [,,] [userAgent=] [system=,component=,object=] [,] [] c.s.f.s.scheduler.SchedulerTask - sync process started on 2022-12-21T06:48:00.000780 for sync pair :17743b1b-a067-4478-a6d8-7b1cff04175a for JobId :dc8dc0dd-fdb9-4873-af55-1c70ba2047a5
This two must be shown in Log Labels:
**sync_pair =17743b1b-a067-4478-a6d8-7b1cff04175a
JobId =dc8dc0dd-fdb9-4873-af55-1c70ba2047a5
**
Check Image ---> https://i.stack.imgur.com/1h27v.png
I want sync_pair and JobId in my Labels.
promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
http_listen_address: 0.0.0.0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
pipeline_stages:
- regex:
# extracts only sync pair and JobId from the log line
expression: '.*sync pair :(?P<syncpair>[a-zA-Z0-9_-]{30,36}).*JobId :(?P<jobid>[a-zA-Z0-9_-]{30,36})'
- labels:
# sources extracted syncpair as label 'sync pair' value and jobid as label 'JobId'
syncpair:
jobid:
- job_name: containers
static_configs:
- targets:
- localhost
labels:
job: containers
__path__: /var/lib/docker/containers/*/*log
- job_name: dockerlogs
file_sd_configs:
- files:
- /etc/promtail/promtail-targets.yml
relabel_configs:
- source_labels: [job]
target_label: job
- source_labels: [__address__]
target_label: container_id
- source_labels: [container_id]
target_label: __path__
replacement: /var/lib/docker/containers/*/*log
Try this tool -> https://regex101.com/r/rgp49r/1
Reference -> Using Promtail to sum log line values - Pipeline Stages - Metrics
Check output image for Labels

How to configure Rabbitmq Metric in Prometheus/Grafana Kubernetes

My question i want to add rabbitmq monitoring in prometheus. I already have rabbitmq running in Kubernetes but i dont know how to add rabbitmq metric in prometheus
I have install promethues and grafana through yaml file along with pv,pvc,storage,svc,config,deploy and cluster-role
Here is the screenshot of rabbitmq showing empty in promethues
I have install Kubernetes in one vm with local storage i.e.., control-plane and node both install in one vm and everything is working fine
Here is my prometheus-config yaml file
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 5s
evaluation_interval: 5s
rule_files:
- /etc/prometheus/prometheus.rules
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager.monitoring.svc:9093"
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'rabbitmq'
metrics_path: /metrics
scrape_interval: 5s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: rabbitmq
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: prometheus
Even after adding rabbitmq metric it is not showing in prometheus url (target)
here is my rabbitmq-svc yaml file
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: xxx-rabbitmq
spec:
selector:
matchLabels:
app: xxx-rabbitmq
serviceName: xxx-rabbitmq
replicas: 1
template:
metadata:
labels:
app: xxx-rabbitmq
spec:
terminationGracePeriodSeconds: 10
containers:
- name: xxx-rabbitmq
image: rabbitmq:3.7.3-management
ports:
- containerPort: xxx
- containerPort: xxx
- containerPort: xxx
- containerPort: xxx
volumeMounts:
- name: xxx-rabbitmq-pvc
mountPath: /var/lib/rabbitmq
subPath: rabbitmq
envFrom:
- configMapRef:
name: rabbitmq-config
volumeClaimTemplates:
- metadata:
name: xxx-rabbitmq-pvc
spec:
accessModes: ["ReadWriteOnce"]
volumeMode: Filesystem
storageClassName: xxxx-storage
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
name: xxx-rabbitmq
labels:
app: xxx-rabbitmq
spec:
type: ClusterIP
ports:
- port: xxx
name: main
- port: xxx
name: rabbitmqssl
- port: xxx
name: rabvitmqmgmt
selector:
app: xxx-rabbitmq
Please help me out how to get all information of rabbitmq in promethues/grafana
If you use rabbitmq of version 3.8 and above you have to use rabbitmq_prometheus plugin.
As the docs state:
The plugin exposes all RabbitMQ metrics on a dedicated TCP port, in
Prometheus text format.
To enable it run:
rabbitmq-plugins enable rabbitmq_prometheus
To validate it's working run inside rabbitmq container:
curl -s localhost:15692/metrics | head -n 3
and see if you get metrics in response.
For versions prior to 3.8, you'd have to use prometheus_rabbitmq_exporter

Connection refused on Prometheus metrics target deployed on Minikube with a custom SSL certificate

I am currently having issues trying to get Prometheus to scrape the metrics for my Minikube cluster. Prometheus is installed via the kube-prometheus-stack
kubectl create namespace monitoring && \
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
helm repo update && \
helm install -n monitoring prometheus-stack prometheus-community/kube-prometheus-stack
I am currently accessing Prometheus from an Ingress with a locally signed TLS certificate and it appears it's leading to conflicts as connection keeps getting refused by the cluster.
TLS is set up via Minikube ingress add-on:
kubectl create secret -n kube-system tls mkcert-tls-secret --cert=cert.pem --key=key.pem
minikube addons configure ingress <<< "kube-system/mkcert-tls-secret" && \
minikube addons disable ingress && \
minikube addons enable ingress
It seems Prometheus can't get access to http-metrics as a target. I installed Prometheus via a helm chart:
kubectl create namespace monitoring && \
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
helm repo update && \
helm install -n monitoring prometheus-stack prometheus-community/kube-prometheus-stack
Here is my Prometheus configuration:
global:
scrape_interval: 30s
scrape_timeout: 10s
evaluation_interval: 30s
external_labels:
prometheus: monitoring/prometheus-stack-kube-prom-prometheus
prometheus_replica: prometheus-prometheus-stack-kube-prom-prometheus-0
alerting:
alert_relabel_configs:
- separator: ;
regex: prometheus_replica
replacement: $1
action: labeldrop
alertmanagers:
- follow_redirects: true
enable_http2: true
scheme: http
path_prefix: /
timeout: 10s
api_version: v2
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: prometheus-stack-kube-prom-alertmanager
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-web
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- monitoring
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-stack-kube-prom-prometheus-rulefiles-0/*.yaml
scrape_configs:
- job_name: serviceMonitor/monitoring/prometheus-stack-kube-prom-kube-controller-manager/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app, __meta_kubernetes_service_labelpresent_app]
separator: ;
regex: (kube-prometheus-stack-kube-controller-manager);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release, __meta_kubernetes_service_labelpresent_release]
separator: ;
regex: (prometheus-stack);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system
- job_name: serviceMonitor/monitoring/prometheus-stack-kube-prom-kube-etcd/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app, __meta_kubernetes_service_labelpresent_app]
separator: ;
regex: (kube-prometheus-stack-kube-etcd);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release, __meta_kubernetes_service_labelpresent_release]
separator: ;
regex: (prometheus-stack);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system
I am also currently accessing (works just fine) the Prometheus instance outside of the cluster with an Ingress using the TLS certificate:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheusdashboard-ingress
namespace: monitoring
labels:
name: prometheusdashboard-ingress
spec:
tls:
- hosts:
- prometheus.demo
secretName: mkcert-tls-secret
rules:
- host: prometheus.demo
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: prometheus-stack-kube-prom-prometheus
port:
number: 9090
Here's the output in the target page of Prometheus:
What do I get the stack access to this TLS certificate which I assume is the main issue here?
Solution to the problem
After some thorough analysis of my various kubernetes clusters, I have found out the following errors exist:
Docker Desktop Environment throws this:
Warning Failed 6s (x3 over 30s) kubelet Error: failed to start container "node-exporter": Error response from daemon: path / is mounted on / but it is not a shared or slave mount
The solution tries solving it but it didn't work out for me.
Minikube Environment
The same error was replicated there too, I opened the web UI for minikube and found that these services were related to these ports
Through this image, we can understand that these services are relevant to these ports. You can try port-forwarding them but even that doesn't help much.
The only way I can see this working is to apply the configuration through prometheus chart.
After some more analysis
You can use the kubernetes-dashboard namespace using the following commmand to access the metrics.
After some debugging, these are the following images used by MiniKube:
kubernetesui/metrics-scraper
kubernetesui/dashboard:v2.3.1
K8S Provisioner
Okay now, what do I do with the Namespace?
You can access the namespace and check out the services.
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.98.229.151 <none> 8000/TCP 3m55s
kubernetes-dashboard ClusterIP 10.107.41.221 <none> 80/TCP 3m55s
Using the kubernetes-dashboard service, you can access metrics from the services. It goes to localhost:9090/metrics where you can get some more metrics like this:
EDIT
You can configure prometheus to access these metrics.
EDIT II
I am not a prometheus expert but I am of a certain opinion that the above mentioned config file isn't best suited for mining metrics. You should consider using a custom config file.

Monitor only one namespace metrics - Prometheus with Kubernetes

I am implementing Prometheus to monitor my Kubernetes system health, where I have multiple clusters and namespaces.
My goal is to monitor only a specefic namespace which called default and just my own pods excluding prometheus Pods and monitoring details.
I tried to specify the namespace in the kubernetes_sd_configs like this:
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- 'default'
Buit I still getting metrics that I don't need.
Here is my configMap.yml:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: default
data:
prometheus.rules: |-
groups:
- name: devopscube demo alert
rules:
- alert: High Pod Memory
expr: sum(container_memory_usage_bytes) > 1
for: 1m
labels:
severity: slack
annotations:
summary: High Memory Usage
prometheus.yml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
rule_files:
- /etc/prometheus/prometheus.rules
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager.monitoring.svc:9093"
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- 'default'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
namespaces:
names:
- 'default'
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- 'default'
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
namespaces:
names:
- 'default'
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- 'default'
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
I don't want to this details below for example to be monitored:
✔container_memory_rss{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",id="/system.slice/kubelet.service",instance="minikube",job="kubernetes-cadvisor",kubernetes_io_arch="amd64",kubernetes_io_hostname="minikube",kubernetes_io_os="linux"}
✔container_memory_rss{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",id="/system.slice/docker.service",instance="minikube",job="kubernetes-cadvisor",kubernetes_io_arch="amd64",kubernetes_io_hostname="minikube",kubernetes_io_os="linux"}
✔container_memory_rss{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",id="/kubepods/podda7b74d8-b611-4dff-885c-70ea40091b7d",instance="minikube",job="kubernetes-cadvisor",kubernetes_io_arch="amd64",kubernetes_io_hostname="minikube",kubernetes_io_os="linux",namespace="kube-system",pod="default-http-backend-59f7ff8999-ktqnl",pod_name="default-http-backend-59f7ff8999-ktqnl"}
If you just want to prevent certain metrics from being ingested (i.e. prevent from being saved in the Prometheus database), you can use metric relabelling to drop them:
- job_name: kubernetes-cadvisor
metric_relabel_configs:
- source_labels: [__name__]
regex: container_memory_rss
action: drop
Note that in the kubernetes-cadvisor job you use the node service discovery role. This discovers Kubernetes nodes, which are non-namespaced resources, so your namespace restriction to default might not have any effect in this case.
Hey just found this in the docs
# Optional namespace discovery. If omitted, all namespaces are used.
namespaces:
names:
[ - <string> ]
right under https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ingress
this worked for me
- job_name: "kubernetes-cadvisor"
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
metric_relabel_configs:
- action: keep
source_labels: [namespace]
regex: tsb ##namespace name you want
If you want to scrape metrics from a specific application or service, then apply the prometheus scrape annotations to only those application services that you are interested in.
sample
apiVersion: apps/v1beta2 # for versions before 1.8.0 use extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: weave
labels:
app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
spec:
containers:
- name: fluentd-elasticsearch
image: gcr.io/google-containers/fluentd-elasticsearch:1.20
Annotations on pods allow you to control if the metrics need to be scraped or not
prometheus.io/scrape: The default configuration will scrape all pods and, if set to false, this annotation will exclude the pod from the scraping process.
prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
prometheus.io/port: Scrape the pod on the indicated port instead of the pod’s declared ports (default is a port-free target if none are declared).
Your configuration works only with Prometheus Operator.

Opening storage failed" err="invalid block sequence"

What did you do?
I ran prometheus2.0.0 on kubernetesv1.8.5
What did you expect to see?
Everything went well.
What did you see instead? Under which circumstances?
Everything went well at beginning. But several hours later, pods' statuses turned to "CrashLoopBackOff", all prometheus turned unavaliable. After prometheus pods created, nothing has been done.
[root#k8s-1 prometheus]# kubectl get all -n monitoring
NAME DESIRED CURRENT AGE
statefulsets/prometheus-k8s 0 2 16h
NAME READY STATUS RESTARTS AGE
po/prometheus-k8s-0 0/1 CrashLoopBackOff 81 16h
po/prometheus-k8s-1 0/1 CrashLoopBackOff 22 16h
Environment
[root#k8s-1 prometheus]# kubectl version --short
Client Version: v1.8.5
Server Version: v1.8.5
[root#k8s-1 prometheus]# docker images | grep -i prometheus
quay.io/prometheus/alertmanager v0.12.0 f87cbd5f1360 5 weeks ago 31.2 MB
quay.io/prometheus/node_exporter v0.15.2 ff5ecdcfc4a2 6 weeks ago 22.8 MB
quay.io/prometheus/prometheus v2.0.0 67141fa03496 2 months ago 80.2 MB
System information:
[root#k8s-1 prometheus]# uname -srm
Linux 3.10.0-229.el7.x86_64 x86_64
Prometheus version:
v2.0.0
Prometheus configuration file:
[root#k8s-1 prometheus]# cat prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-k8s-config
namespace: monitoring
data:
prometheus.yaml: |
global:
scrape_interval: 10s
scrape_timeout: 10s
evaluation_interval: 10s
rule_files:
- "/etc/prometheus-rules/*.rules"
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-ingresses'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
prometheus.yaml:
[root#k8s-1 prometheus]# cat prometheus-all-together.yaml
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
annotations:
prometheus.io/scrape: "true"
spec:
ports:
- name: web
nodePort: 30900
port: 9090
protocol: TCP
targetPort: web
selector:
prometheus: k8s
sessionAffinity: None
type: NodePort
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
selector:
matchLabels:
app: prometheus
prometheus: k8s
serviceName: prometheus-k8s
replicas: 2
template:
metadata:
labels:
app: prometheus
prometheus: k8s
spec:
securityContext:
runAsUser: 65534
fsGroup: 65534
runAsNonRoot: true
containers:
- args:
- --config.file=/etc/prometheus/config/prometheus.yaml
- --storage.tsdb.path=/cephfs/prometheus/data
- --storage.tsdb.retention=180d
- --web.route-prefix=/
- --web.enable-lifecycle
- --web.enable-admin-api
image: quay.io/prometheus/prometheus:v2.0.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /status
port: web
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
name: prometheus
ports:
- containerPort: 9090
name: web
protocol: TCP
readinessProbe:
failureThreshold: 6
httpGet:
path: /status
port: web
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/prometheus/config
name: config
readOnly: false
- mountPath: /etc/prometheus/rules
name: rules
readOnly: false
- mountPath: /cephfs/prometheus/data
name: data
subPath: prometheus-data
readOnly: false
serviceAccount: prometheus-k8s
serviceAccountName: prometheus-k8s
terminationGracePeriodSeconds: 60
volumes:
- configMap:
defaultMode: 511
name: prometheus-k8s-config
name: config
- configMap:
defaultMode: 511
name: prometheus-k8s-rules
name: rules
- name: data
persistentVolumeClaim:
claimName: cephfs-pvc
updateStrategy:
type: RollingUpdate
Logs:
[root#k8s-1 prometheus]# kubectl logs prometheus-k8s-0 -n monitoring
level=info ts=2018-01-20T03:16:32.966070249Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2018-01-20T03:16:32.966225361Z caller=main.go:216 build_context="(go=go1.9.2, user=root#615b82cb36b6, date=20171108-07:11:59)"
level=info ts=2018-01-20T03:16:32.966252185Z caller=main.go:217 host_details="(Linux 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 prometheus-k8s-0 (none))"
level=info ts=2018-01-20T03:16:32.969789371Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-01-20T03:16:32.971388907Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2018-01-20T03:16:32.971596811Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=error ts=2018-01-20T03:16:59.781338012Z caller=main.go:323 msg="Opening storage failed" err="invalid block sequence: block time ranges overlap (1516348800000, 1516356000000)"
[root#k8s-1 prometheus]#
[root#k8s-1 prometheus]# kubectl logs prometheus-k8s-1 -n monitoring
level=info ts=2018-01-20T03:15:22.701351679Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2018-01-20T03:15:22.70148418Z caller=main.go:216 build_context="(go=go1.9.2, user=root#615b82cb36b6, date=20171108-07:11:59)"
level=info ts=2018-01-20T03:15:22.701512333Z caller=main.go:217 host_details="(Linux 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 prometheus-k8s-1 (none))"
level=info ts=2018-01-20T03:15:22.705824203Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-01-20T03:15:22.707629775Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2018-01-20T03:15:22.707837323Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=error ts=2018-01-20T03:15:54.775639791Z caller=main.go:323 msg="Opening storage failed" err="invalid block sequence: block time ranges overlap (1516348800000, 1516356000000)"
[root#k8s-1 prometheus]# kubectl describe po/prometheus-k8s-0 -n monitoring
Name: prometheus-k8s-0
Namespace: monitoring
Node: k8s-3/172.16.1.8
Start Time: Fri, 19 Jan 2018 17:59:38 +0800
Labels: app=prometheus
controller-revision-hash=prometheus-k8s-7d86dfbd86
prometheus=k8s
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"StatefulSet","namespace":"monitoring","name":"prometheus-k8s","uid":"7593d8ac-fcff-11e7-9333-fa163e48f857"...
Status: Running
IP: 10.244.2.54
Created By: StatefulSet/prometheus-k8s
Controlled By: StatefulSet/prometheus-k8s
Containers:
prometheus:
Container ID: docker://98faabe55fb71050aacd776d349a6567c25c339117159356eedc10cbc19ef02a
Image: quay.io/prometheus/prometheus:v2.0.0
Image ID: docker-pullable://quay.io/prometheus/prometheus#sha256:53afe934a8d497bb703dbbf7db273681a56677775c462833da8d85015471f7a3
Port: 9090/TCP
Args:
--config.file=/etc/prometheus/config/prometheus.yaml
--storage.tsdb.path=/cephfs/prometheus/data
--storage.tsdb.retention=180d
--web.route-prefix=/
--web.enable-lifecycle
--web.enable-admin-api
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 20 Jan 2018 11:11:00 +0800
Finished: Sat, 20 Jan 2018 11:11:29 +0800
Ready: False
Restart Count: 84
Limits:
cpu: 500m
memory: 500Mi
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get http://:web/status delay=30s timeout=3s period=5s #success=1 #failure=10
Readiness: http-get http://:web/status delay=0s timeout=3s period=5s #success=1 #failure=6
Environment: <none>
Mounts:
/cephfs/prometheus/data from data (rw)
/etc/prometheus/config from config (rw)
/etc/prometheus/rules from rules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-x8xzh (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-k8s-config
Optional: false
rules:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-k8s-rules
Optional: false
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: cephfs-pvc
ReadOnly: false
prometheus-k8s-token-x8xzh:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-token-x8xzh
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 15m (x83 over 17h) kubelet, k8s-3 Container image "quay.io/prometheus/prometheus:v2.0.0" already present on machine
Warning FailedSync 23s (x1801 over 7h) kubelet, k8s-3 Error syncing pod
logs on kubernetes nodes:
[root#k8s-3 01C48JAGH1QCGKGCG72E0B2Y8R]# journalctl -xeu kubelet --no-pager
1月 20 11:21:54 k8s-3 kubelet[14306]: I0120 11:21:54.619924 14306 kuberuntime_manager.go:749] Back-off 5m0s restarting failed container=prometheus pod=prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)
1月 20 11:21:54 k8s-3 kubelet[14306]: E0120 11:21:54.620042 14306 pod_workers.go:182] Error syncing pod 7598959a-fcff-11e7-9333-fa163e48f857 ("prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)"), skipping: failed to "StartContainer" for "prometheus" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=prometheus pod=prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)"
1月 20 11:22:08 k8s-3 kubelet[14306]: I0120 11:22:08.615438 14306 kuberuntime_manager.go:500] Container {Name:prometheus Image:quay.io/prometheus/prometheus:v2.0.0 Command:[] Args:[--config.file=/etc/prometheus/config/prometheus.yaml --storage.tsdb.path=/cephfs/prometheus/data --storage.tsdb.retention=180d --web.route-prefix=/ --web.enable-lifecycle --web.enable-admin-api] WorkingDir: Ports:[{Name:web HostPort:0 ContainerPort:9090 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[cpu:{i:{value:500 scale:-3} d:{Dec:<nil>} s:500m Format:DecimalSI} memory:{i:{value:524288000 scale:0} d:{Dec:<nil>} s:500Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:config ReadOnly:false MountPath:/etc/prometheus/config SubPath: MountPropagation:<nil>} {Name:rules ReadOnly:false MountPath:/etc/prometheus/rules SubPath: MountPropagation:<nil>} {Name:data ReadOnly:false MountPath:/cephfs/prometheus/data SubPath:prometheus-data MountPropagation:<nil>} {Name:prometheus-k8s-token-x8xzh ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:web,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:3,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:10,} ReadinessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:web,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:0,TimeoutSeconds:3,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:6,} Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
1月 20 11:22:08 k8s-3 kubelet[14306]: I0120 11:22:08.615662 14306 kuberuntime_manager.go:739] checking backoff for container "prometheus" in pod "prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)"
Any suggestions? Thanks.
Two Prometheus servers cannot share the same storage directory, you should have gotten a locking error about this.