Getting duplicate metrics when doing querying from the Prometheus Server - kubernetes

I am getting metrics exposed by kube-state-metrics by querying Prometheus-server but the issue is I am getting duplicate metrics with difference only in the job field. . I am doing query such as :
curl 'http://10.101.202.25:80/api/v1/query?query=kube_pod_status_phase'| jq
The only difference is coming the job field. Metrics coming when querying Prometheus-Server
All pods running in the cluster: https://imgur.com/PKIc3ug
Any help is appreciated.
Thank You
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
# - "first.rules"
# - "second.rules"
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']

You are running (or at least ingesting) two copies of kube-state-metrics. Probably one you installed and configured yourself and another from something like kube-prometheus-stack?

I was able to get what I wanted eventually. What I did was to remove the scraping config of prometheus-kube-state-metrics from the value.yml and defining that in the config file i.e. prometheus.yml. For now it's working fine. Thank You #SYN and #coderanger for the help.

Related

How to access metrics which is located in another namespace in prometheus in Kubernetes

Suppose there is an Application which is located in namespace called "API" and the Prometheus Server which is located in namespace "prometheus", how can I access my Application from Prometheus Server if both of the Server and Application are in different namespaces?
I've tried to specify following construction <application-service-name>.API.svc.cluster.local:<application-service-port> as a reference to the Application, but it does not seems to work
And the Prometheus responds in the UI with Connection Refused.
scrape_configs:
- job_name: 'some-job'
kubernetes_sd_configs:
namespaces:
names: 'API'
scrape_interval: 10s
scrape_timeout: 5s
static_configs:
- targets: ['application-service-name>.API.svc.cluster.local:<application-service-port>']

Error with external_labels config in alertmanager.yml section of helm Prometheus values.yaml

I've installed prometheus using helm into my kubernetes cluster as follows;
helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
prometheus prometheus 9 2021-09-07 08:54:54.262013 +0100 +01 deployed prometheus-14.6.0 2.26.0
I am trying to apply external_labels in the values.yaml to identify the time series sent to Alertmanager. I've used the prometheus docs to get what I believe to be the correct config, as below;
alertmanagerFiles:
alertmanager.yml:
global:
external_labels:
environment: 'perf'
My installation goes ok;
helm upgrade --install prometheus .
However my prometheus-server pod is crashing due to the following error;
level=error ts=2021-09-06T18:49:25.059Z caller=coordinator.go:124 component=configuration msg="Loading configuration file failed" file=/etc/config/alertmanager.yml err="yaml: unmarshal errors:\n line 2: fie │
│ ld external_labels not found in type config.plain"
Many of the answers here point to indentation issues, however I can't see what I am doing wrong.. from the Prometheus docs;
global:
# The labels to add to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
[ <labelname>: <labelvalue> ... ]
I have been scratching my head on this for a week or two - would appreciate a second pair of more experienced eyes, thank you! 🙏
I have managed to get this working.. firstly I was putting the configuration in totally the wrong place. I figured this out when looking at the github page for prometheus alertmanager, and I could not see the field defined in the 'good config test', so it must be configured elsewhere..
Indeed the prometheus config page says so - so I added a section under ## Prometheus server ConfigMap entries;
serverFiles:
prometheus.yml:
global:
external_labels:
environment: perf
This did not work either, the pod was crashing. Turns out this should be configured in the part in the values.yaml which configures the prometheus-server container itself - where the top level field = server, and we can see the default global values are also configured here. So I added external_labels into this section;
server:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
external_labels:
environment: perf
When I upgraded using helm upgrade --install prometheus . I can now see the correct config in kubectl get cm prometheus-server -o yaml, plus my Pager Duty alerts are now showing the environment name in the Summary.
A little side tip on how to test alerts without having to kill pods/create OOMs etc is to create an alert expr: which constantly fires (e.g kube_pod_container_status_restarts_total > 3) which I did by accident but proved to be quite useful.

Limit prometheus to discover pods in specific namespaces ONLY

I am trying to run Prometheus to ONLY monitor pods in specific namespaces (in openshift cluster).
I am getting "cannot list pods at the cluster scope" - But I have tried to set it to not use ClusterScope (only look in specific namespaces instead)..
I've set:
prometheus.yml: |
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: kubernetes-pods
kubernetes_sd_configs:
- namespaces:
names:
- api-mytestns1
- api-mytestns2
role: pod
relabel_configs:
[cut]
I get this error - even if I remove the -job_name: kubernetes-pods entirely.. so maybe its something else in prometheus, that needs disabling?
I found that one had to overwrite server.alertmanagers with a complete copy of the settings in charts/prometheus/templates/server-configmap.yaml - to override the hardcoded default in those, to try and scrape cluster-wide.

Prometheus Alert Manager for Federation

We have several clusters where our applications are running. We would like to set up a Central Monitoring cluster which can scrape metrics from rest of cluster using Prometheus Federation.
So to do that, I need to install prometheus server in each of cluster and install prometheus server via federation in central cluster.I will install Grafana as well in central cluster to visualise the metrics that we gather from rest of prometheus server.
So the question is;
Where should I setup the Alert Manager? Only for Central Cluster or each cluster has to be also alert manager?
What is the best practice alerting while using Federation?
I though ı can use ingress controller to expose each prometheus server? What is the best practice to provide communication between prometheus server and federation in k8s?
Based on this blog
Where should I setup the Alert Manager? Only for Central Cluster or each cluster has to be also alert manager?
What is the best practice alerting while using Federation?
The answer here would be to do that on each cluster.
If the data you need to do alerting is moved from one Prometheus to another then you've added an additional point of failure. This is particularly risky when WAN links such as the internet are involved. As far as is possible, you should try and push alerting as deep down the federation hierarchy as possible. For example an alert about a target being down should be setup on the Prometheus scraping that target, not a global Prometheus which could be several steps removed.
I though ı can use ingress controller to expose each prometheus server? What is the best practice to provide communication between prometheus server and federation in k8s?
I think that depends on use case, in each doc I checked they just use targets in scrape_configs.static_configs in the prometheus.yml
like here
scrape_configs:
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 'source-prometheus-1:9090'
- 'source-prometheus-2:9090'
- 'source-prometheus-3:9090'
OR
like here
prometheus.yml:
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 'prometheus-server:80'
Additionally, worth to check how they did this in this tutorial, where they used helm to build central monitoring cluster with two prometheus servers on two clusters.

Can prometheus scrape targets together?

I need Prometheus to scrape several mongodb exporters one after another in order to compute a valid replication lag. However, the targets are scraped with a difference of several dozen seconds between them, which makes replication lag impossible to compute.
The job yaml is below:
- job_name: mongo-storage
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- mongo-1a-exporter.monitor:9216
- mongo-2a-exporter.monitor:9216
- mongo-3a-exporter.monitor:9216
- mongos-exporter.monitor:9216
- mongo-1b-exporter.monitor:9216
- mongo-2b-exporter.monitor:9216
- mongo-3b-exporter.monitor:9216
labels:
cluster: mongo-storage
This isn't possible, Prometheus makes no guarantees about the phase of scrapes or rule evaluations. Nor is this something you should depend upon, as it'd be very fragile.
I'd aim for knowing the lag within a scrape interval, rather than trying to get it perfect. You generally care if replication is completely broken, rather than if it's slightly delayed. A heartbeat job could also help.
This isn't possible with Prometheus... normally.
However it might be possible to exploit the prometheus/pushgateway to achieve what you want. My thinking is that you write a script/tool to scrape the mongo exporters in a synchronised way, threads/forks/whatever, and then push those metrics into a prometheus/pushgateway instance.
Then configure prometheus to scrape the prometheus/pushgateway instead of the mongo exporters, and since all the metrics are in the one endpoint they will hopefully always be in sync and avoid any lag regarding being up to date.
Hope this helps.