Prometheus - not getting node metrics from all nodes - kubernetes

When I query Prometheus for node related metrics, I seem to be missing metrics from some nodes. All the metrics endpoints for the nodes are up. Could someone help me with this, I am new to Kubernetes and Prometheus
node_disk_info
Not getting metrics from all nodes

Related

Custom Metrics API service install for kubernetes cluster

We are planning to Kubernetes horizontal pod scheduler and for that need to install Custom Metrics API.
Can someone please tell different ways to install Custom Metrics API on kubernetes cluster?
As you are using EKS with Prometheus, the best source of knowledge is AWS documentation.
Do i need prometheus adaptor for registering custom metrics API?
Yes, you need at least Prometheus and Prometheus Adapter.
Prometheus: scrapes pods and stores metrics
Prometheus metrics adapter: queries Prometheus and exposes metrics for the Kubernetes custom metrics API
Metrics server: collects pods CPU and memory usage and exposes metrics for the Kubernetes resource metrics API
Without Custom Metrics or External Metrics, you can only use metrics based on CPU or Memory.
In Autoscaling Amazon EKS services based on custom Prometheus metrics using CloudWatch Container Insights article, it's stated:
The custom metrics gathered by Prometheus can be exposed to the autoscaler using a Prometheus Adapter as outlined in the blog post titled Autoscaling EKS on Fargate with custom metrics.
In Autoscaling EKS on Fargate with custom metrics blog you also find some examples of autoscaling based on CPU usage, autoscaling based on App Mesh traffic or autoscaling based on HTTP traffic
Additional documentation
Control plane metrics with Prometheus
Why can't I collect metrics from containers, pods, or nodes using Metrics Server in Amazon EKS?
Install the CloudWatch agent with Prometheus metrics collection on Amazon EKS and Kubernetes clusters

HPA using Kafka Exporter in on premise Kubernetes cluster

I had been trying to implement Kubernetes HPA using Metrics from Kafka-exporter. Hpa supports Prometheus, so we tried writing the metrics to prometheus instance. From there, we are unclear on the steps to do. Is there an article where it will explain in details ?
I followed https://medium.com/google-cloud/kubernetes-hpa-autoscaling-with-kafka-metrics-88a671497f07
for same in GCP and we used stack driver, and the implementation worked like a charm. But, we are struggling in on-premise setup, as stack driver needs to be replaced by Prometheus
In order to scale based on custom metrics, Kubernetes needs to query an API for metrics to check for those metrics. That API needs to implement the custom metrics interface.
So for Prometheus, you need to setup an API that exposes Prometheus metrics through the custom metrics API. Luckily, there already is an adapter.
When I implemented Kubernetes HPA using Metrics from Kafka-exporter I had a few setbacks which I solved doing the following:
I deployed the kafka-exporter container as a sidecar to the pods I
wanted to scale. I found that the HPA scales the pod it gets the
metrics from.
I used annotations to make Prometheus scrape the metrics from the pods with exporter.
Then I verified that the kafka-exporter metrics are getting to Prometheus. If it's not there you can't advance further.
I deployed prometheus adapter using its helm chart. The adapter will "translate" Prometheus's metrics into custom Metrics
Api, which will make it visible to HPA.
I made sure that the metrics are visible in k8s by executing kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 from one of
the master nodes.
I created an hpa with the matching metric name.
Here is a complete guide explaining how to implement Kubernetes HPA using Metrics from Kafka-exporter
Please comment if you have more questions

Prometheus is not collecting pod metrics

I deployed Prometheus and Grafana into my cluster.
When I open the dashboards I don't get data for pod CPU usage.
When I check Prometheus UI, it shows pods 0/0 up, however I have many pods running in my cluster.
What could be the reason? I have node exporter running in all of nodes.
Am getting this for kube-state-metrics,
I0218 14:52:42.595711 1 builder.go:112] Active collectors: configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,jobs,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets
I0218 14:52:42.595735 1 main.go:208] Starting metrics server: 0.0.0.0:8080
Here is my Prometheus config file:
https://gist.github.com/karthikeayan/41ab3dc4ed0c344bbab89ebcb1d33d16
I'm able to hit and get data for:
http://localhost:8080/api/v1/nodes/<my_worker_node>/proxy/metrics/cadvisor
As it was mentioned by karthikeayan in comments:
ok, i found something interesting in the values.yaml comments, prometheus.io/scrape: Only scrape pods that have a value of true, when i remove this relabel_config in k8s configmap, i got the data in prometheus ui.. unfortunately k8s configmap doesn't have comments, i believe helm will remove the comments before deploying it.
And just for clarification:
kube-state-metrics vs. metrics-server
The metrics-server is a project that has been inspired by Heapster and is implemented to serve the goals of the Kubernetes Monitoring Pipeline. It is a cluster level component which periodically scrapes metrics from all Kubernetes nodes served by Kubelet through Summary API. The metrics are aggregated, stored in memory and served in Metrics API format. The metric-server stores the latest values only and is not responsible for forwarding metrics to third-party destinations.
kube-state-metrics is focused on generating completely new metrics from Kubernetes' object state (e.g. metrics based on deployments, replica sets, etc.). It holds an entire snapshot of Kubernetes state in memory and continuously generates new metrics based off of it. And just like the metric-server it too is not responsibile for exporting its metrics anywhere.
Having kube-state-metrics as a separate project also enables access to these metrics from monitoring systems such as Prometheus.

Prometheus Metrics - Use for autoscaling

I've setup prometheus to collect metrics from my pods and nodes.
I've also setup the prometheus custom metrics adapter.
How can I use those metrics provided by prometheus to autoscale my pods ? I tried to google it but I only find custom pods that provides their metrics on their /metrics url. I would like to be able to autoscale any of my pods that already have a prometheus metric based on the cpu or memory usage.
I can visualize all the metrics in grafana for all my pods and nodes but can't find a way to use it with autoscale
You need to create an HPA (Horizontal Pod Autoscaler)
More info here
This is a good tool showing you how to use an HPA with custom metrics either using a the K8s metrics server or Prometheus.

Kubernetes pod metrics

There are three levels of metrics collection to consider in Kubernetes - Node, Pod and the Application that runs in the pod.
For Node and Application metrics I have solutions that work wonderfully, but I am stuck on pod metrics.
I have tried cAdvisor and Kube state metrics but none of them give me what I want. Kube state metrics only gives information that is already known like pod CPU limits and requests. cAdvisor doesn't insert pod labels to container names so I have no means of knowing which pod is misbehaving.
Given a pod, I'd like to know it's CPU, memory and storage usage both with respect to the pod itself and also with respect to the node it is scheduled on.
I am using prometheus to collect metrics via the prometheus operator CRD.
Can anyone help suggest an open source metrics exporter that would do the job I mentioned above?
The standard metric collector is Heapster. It comes preinstalled in many vendors like GKE also. With Heapster installed, you can just do kubectl top pods to see cpu/mem metrics on the client side. You can plug it with some sink to store the results for archival.
https://github.com/kubernetes/heapster