Monitor Kubernetes Cluster from other Kubernetes cluster with Prometheus - kubernetes

I have run several Kubernetes Clusters on Azure AKS, so I intend to create Centralized Monitoring Cluster run on other Kubernetes cluster with Prometheus-Grafana.
My idea is that:
Isolating & centralizing monitoring cluster.
In case of any cluster is downed, Monitoring cluster is still alive to inspect downed cluster.
Run cross-cloud provider Kubernetes (if available)
I'm confusing about connecting clusters, network, ingress, how does Prometheus discovery, pull metric from outside cluster...
Is there any best practice, instruction for my usecase. Thank you!

Yes, Prometheus is a very flexible monitoring solution wherein each Prometheus server is able to act as a target for another Prometheus server. Using prometheus federation, Prometheus servers can scrape selected time series data from other Prometheus servers.
A typical Prometheus federation example configuration looks like this:
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 'source-prometheus-1:9090'
- 'source-prometheus-2:9090'
- 'source-prometheus-3:9090'

Related

Prometheus Federated server metrics not appearing

I have multiple Kubernetes clusters and I wish to use the Prometheus federation.
Each cluster has its own Prometheus installed using helm.
Only one cluster's Prometheus is connected to Grafana.
This Prometheus server is going to get metrics from all other clusters Prometheus servers using federation.
Since I am using helm, I added this in values.yaml of the main central Prometheus server
extraScrapeConfigs: |
- job_name: 'federate'
scrape_interval: 10s
scheme: https
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{__name__=~"container_.*|kube_.*"}'
static_configs:
- targets:
- 'prometheus.prod-1.xx.xx'
- 'prometheus.prod-2.xx.xx'
- 'prometheus.prod-3.xx.xx'
Then I apply the change using
helm upgrade prometheus prometheus-community/kube-prometheus-stack -f values.yaml
It should work but it is not working.
The metrics of the other servers are not appearing in the main central server.
I verified using query filters of other servers.
I verified whether the federation is working or using the following link:
https://prometheus.prod-1.xx.xx/federate?match[]={__name__=~"container_.*|kube.*|"}
and this link returned all the metrics.
I access my main central Prometheus server using kubectl port-forward and then visit http://localhost:9090/config but in the config, I don't see the federate config which I added.
Also, nothing wrong is appearing in the logs, no error/warning msgs.
What am I doing wrong? Is there something more that is required?

Add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes

I run prometheus locally as http://localhost:9090/targets with
docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
and want to connect it to several Kubernetes (cluster) instances we have.
See that scraping works, try Grafana dashboards etc.
And then I'll do the same on dedicated server that will be specially for monitoring.
However all googling gives me all different ways to configure prometheus that is already within one Kubernetes instance, and no way to read metrics from external Kubernetes.
How to add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes?
I have read Where Kubernetes metrics come from and checked that my (first) Kubernetes cluster has the Metrics Server.
kubectl get pods --all-namespaces | grep metrics-server
There is definitely no sense to add Prometheus instance into every Kubernetes (cluster) instance. One Prometheus must be able to read metrics from many Kubernetes clusters and every node within them.
P.S. Some old question has answer to install Prometheus in every Kubernetes and then use federation, that is just opposite from what I am looking for.
P.P.S. It is also strange for me, why Kubernetes and Prometheus that are #1 and #2 projects from Cloud Native Foundation don't have simple "add Kubernetes target in Prometheus" button or simple step.
If I understand your question, you want to monitor kubernetes cluster where prometheus is not installed on remote kubernetes cluster.
I monitor many different kubernetes cluster from one prometheus which
is installed on a standalone server.
You can do this by generating a token on the kubernetes server using a service account which has proper permission to access the kubernetes api.
Kubernetes-api:
Following are the details required to configure prometheus scrape job.
Create a service account which has permissions to read and watch the
pods.
Generate token from the service account.
Create scrape job as following.
- job_name: kubernetes
kubernetes_sd_configs:
- role: node
api_server: https://kubernetes-cluster-api.com
tls_config:
insecure_skip_verify: true
bearer_token: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
bearer_token: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
scheme: https
tls_config:
insecure_skip_verify: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
I have explained the same in detail in the article
"Monitor remote kubernetes cluster using prometheus".
https://amjadhussain3751.medium.com/monitor-remote-kubernetes-cluster-using-prometheus-a3781b041745
In my opinion, deploying a Prometheus instance in each cluster is a more simple and clean way than organizing external access. The main problem is that the targets discovered with kubernetes_sd_configs are cluster-internal DNS-names and IP-addresses (or at least, it is so in my AWS EKS cluster). To resolve and reach these, you have to be inside the cluster.
This problem can be resolved by using a proxy and so the configuration below uses API-server's proxy endpoint to reach targets. I'm not sure about its performance in large clusters, but in such case it is well-worth to deploy an internal Prometheus instance.
External access through API-server proxy
Things you need (for each cluster):
API-server CA certificate for HTTPS to work (see below how to get it).
Service account token with appropriate permissions (depends on your needs).
Assuming you already have these, here is an example Prometheus configuration:
- job_name: 'kubelet-cadvisor'
scheme: https
kubernetes_sd_configs:
- role: node
api_server: https://api-server.example.com
# TLS and auth settings to perform service discovery
authorization:
credentials_file: /kube/token # the file with your service account token
tls_config:
ca_file: /kube/CA.crt # the file with the CA certificate
# The same as above but for actual scrape request.
# We're going to send scrape requests back to the API-server
# so the credentials are the same.
bearer_token_file: /kube/token
tls_config:
ca_file: /kube/CA.crt
relabel_configs:
# This is just to drop this long __meta_kubernetes_node_label_ prefix
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# By default Prometheus goes to /metrics endpoint.
# This relabeling changes it to /api/v1/nodes/[kubernetes_io_hostname]/proxy/metrics/cadvisor
- source_labels: [kubernetes_io_hostname]
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
target_label: __metrics_path__
# This relabeling defines that Prometheus should connect to the
# API-server instead of the actual instance. Together with the relabeling
# from above this will make the scrape request proxied to the node kubelet.
- replacement: api-server.example.com
target_label: __address__
The above is tailored for scraping role: node. To make it working with other roles, you've got to change __metrics_path__ label. The "Manually constructing apiserver proxy URLs" article can help constructing the path.
How to get API-server CA certificate
There are several ways to get it but getting it from kubeconfig appears to me as the simplest:
❯ kubectl config view --raw
apiVersion: v1
clusters:
- cluster: # you need this ⤋ long value
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJ...
server: https://api-server.example.com
name: default
...
The certificate in kubeconfig is base64-encoded so you have to decode it before it can be used:
echo LS0tLS1CRUdJTiBDRVJUSUZJ... | base64 -d > CA.crt
There are many agents capable of saving metrics collected in k8s to remote Prometheus server outside the cluster, example Prometheus itself now support agent mode, exporter from Opentelemetry, or using managed Prometheus etc.

Does Prometheus scrapes 1 pod on Kubernetes?

I installed my Spring Boot application with 2 or 3 pods on Kubernetes in Linux Server. And to monitor it, I installed Prometheus, too. Currently, the metrics from application to Prometheus go very well.
But I suspect that Prometheus takes metrics from only one pod. With a job like below in Prometheus config file, does prometheus takes metrics only from one pod? How can I make Prometheus scrape all pods in same time?
- job_name: 'SpringBootPrometheusDemoProject'
metrics_path: '/SpringBootPrometheusDemoProject/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['127.0.0.1:8080']
Yes. In this case, you have to add few annotations in your pod (if it does not exist already) and use kubernetes_sd_configs instead of static_configs.
You will find an example here: https://github.com/appscode/third-party-tools/blob/master/monitoring/prometheus/builtin/README.md#kubernetes-pod

Prometheus does not show metrics of all pods

When we use Kubernetes for production and we have a scaled application with many pods and publish as a service, every single metrics fetching request of Prometheus is routed to a pod with a random of selection.
In this situation, results are not true for monitoring.
In a moment we need all pods metrics (for example 10 pod) and it's not possible by calling a Kubernetes Service endpoint!
Is there any solution for this problem?
You can configure your kubernetes_sd_configs so it scrapes the pods individually and not just the service.
To do that, set the role to pod, like this:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
See this blog post for a full config example.

How to update Prometheus config in k8s cluster

I have running Prometheus in k8s. Could you please advice how can I change running config prometheus.yaml in cluster? I just want simply to change:
scrape_configs:
- job_name: my-exporter
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
How can I do this?
Thanks.
The recommended way is to provide the prometheus.yml via a ConfigMap. That way changes in the ConfigMap will be propagated into the pod that consumes the configMap. However, that is not enough for prometheus to pick up the new config.
Prometheus supports runtime reload of the config, so that you don't need to stop prometheus in order to pickup the new config. You can either do that manually by sending a POST request as described in the link above, or automate this process by having a sidecar container inside the same prometheus pod that watch for updates to the config file and does the reload POST request.
The following is an example on the second approach: prometheus-configmaps-continuous-deployment