Prometheus/Grafana- how to generate data points for all pods? - kubernetes

I've deployed Prometheus and Grafana via helm by issuing the following command.
helm install --name prom --namespace monitoring stable/Prometheus-operator
I've managed to use kubectl to port forward to grafana, and I can view the interface just fine.
However, most of my pods are not generating any data points. I've tried re-deploying the workloads as I figured maybe that was necessary for Prometheus to start picking up the metrics. I can select the pods in the workloads in Grafana drop-down menus so they are being detected, but they are not generating any data points or populating the graphs.
What do I need to do to make this happen?
Thanks in advance.

Related

Issues setting up Prometheus on EKS - Pods in Pending State (Seems to be dependent on PVCs waiting on Volume being created)

I have an EKS cluster for my university project and I want to setup Prometheus on the cluster. To do this I am using helm with the following commands (see this tutorial https://archive.eksworkshop.com/intermediate/240_monitoring/deploy-prometheus/):
kubectl create namespace prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2" \
--set server.persistentVolume.storageClass="gp2"
When I check the status of the prometheus pods, the alert-manager and server seem to be in an infinite Pending state:
When I describe the prometheus-alertmanager-0 pod I see the following VolumeBinding error:
When I describe the prometheus-server-5d858bd4bd-6xmws pod I see the following VolumeBinding error:
I can also see there are 2 pvcs in Pending state:
When I describe the prometheus-server pvc, I can see its waiting for a volume to be created:
Im familiar with Kubernetes basics but pvcs is not something that I have used before. Is the solution here to create a "volume" and if so how do I do that?, would that solve the issue?, or am I way off the mark?
Should I try to install Prometheus in a different way?
Any help on this greatly appreciated
Note: Although similar this is not a duplicate of Prometheus server in pending state after installation using Helm. For one the errors highlighted there are different errors, also other manual steps such as creating volumes were also performed (which I have not done), Finally, I am following the specific tutorial referenced and also I am asking if I should try to setup Prometheus a different way if there is a simpler way

Installed prometheus-community / helm-charts but I can't get metrics on "default" namespace

I recently learned about helm and how easy it is to deploy the whole prometheus stack for monitoring a Kubernetes cluster, so I decided to try it out on a staging cluster at my work.
I started by creating a dedicates namespace on the cluster for monitoring with:
kubectl create namespace monitoring
Then, with helm, I added the prometheus-community repo with:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
Next, I installed the chart with a prometheus release name:
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring
At this time I didn't pass any custom configuration because I'm still trying it out.
After the install is finished, it all looks good. I can access the prometheus dashboard with:
kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 9090 -n monitoring
There, I see a bunch of pre-defined alerts and rules that are monitoring but the problem is that I don't quite understand how to create new rules to check the pods in the default namespace, where I actually have my services deployed.
I am looking at http://localhost:9090/graph to play around with the queries and I can't seem to use any that will give me metrics on my pods in the default namespace.
I am a bit overwhelmed with the amount of information so I would like to know what did I miss or what am I doing wrong here?
The Prometheus Operator includes several Custom Resource Definitions (CRDs) including ServiceMonitor (and PodMonitor). ServiceMonitor's are used to define services to the Operator to be monitored.
I'm familiar with the Operator although not the Helm deployment but I suspect you'll want to create ServiceMonitors to generate metrics for your apps in any (including default) namespace.
See: https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions
ServiceMonitors and PodMonitors are CRDs for Prometheus Operator. When working directly with Prometheus helm chart (without operator), you need have to configure your targets directly in values.yaml by editing the scrape_configs section.
It is more complex to do it, so take a deep breath and start by reading this: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

Is the prometheus-to-sd required for GKE? Can I delete it?

A while back a GKE cluster got created which came with a daemonset of:
kubectl get daemonsets --all-namespaces
...
kube-system prometheus-to-sd 6 6 6 3 6 beta.kubernetes.io/os=linux 355d
Can I delete this daemonset without issue?
What is it being used for?
What functionality would I be losing without it?
TL;DR
Even if you delete it, it will be back.
A little bit more explanation
Citing explanation by user #Yasen what prometheus-to-sd is:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Github.com: Prometheus-to-sd
Assuming that the command deleting this daemonset will be:
$ kubectl delete daemonset prometheus-to-sd --namespace=kube-system
Executing this command will indeed delete the daemonset but it will be back after a while.
prometheus-to-sd daemonset is managed by Addon-Manager which will recreate deleted daemonset back to original state.
Below is the part of the prometheus-to-sd daemonset YAML definition which states that this daemonset is managed by addonmanager:
labels:
addonmanager.kubernetes.io/mode: Reconcile
You can read more about it by following: Github.com: Kubernetes: addon-manager
Deleting this daemonset is strictly connected to the monitoring/logging solution you are using with your GKE cluster. There are 2 options:
Stackdriver logging/monitoring
Legacy logging/monitoring
Stackdriver logging/monitoring
You need to completely disable logging and monitoring of your GKE cluster to delete this daemonset.
You can do it by following a path:
GCP -> Kubernetes Engine -> Cluster -> Edit -> Kubernetes Engine Monitoring -> Set to disabled.
Legacy logging/monitoring
If you are using a legacy solution which is available to GKE version 1.14, you need to disable the option of Legacy Stackdriver Monitoring by following the same path as above.
Let me know if you have any questions in that.
TL;DR - it's ok
Assuming your context, I suppose, it's ok to shutdown prometheus component of your cluster.
Except cases when reports, alerts and monitoring - are critical parts of your system.
Let dive in the sources of GCP
As per source code at GoogleCloudPlatform:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Prometheus
From their Prometheus Github Page:
The Prometheus monitoring system and time series database.
To get a picture what is it for - you can read awesome guide on Prometheus: Prometheus Monitoring : The Definitive Guide in 2019 – devconnected
Also, there are hundreds of videos on their Youtube channel Prometheus Monitoring
Your questions
So, answering to your questions:
Can I delete this daemonset without issue?
It depends. As I said, you can. Except cases when reports, alerts and monitoring - are critical parts of your system.
What is it being used for
It's a TSDB for monitoring
what functionality would I be loosing without it?
metrics
→ therefore dashboards
→ therefore alerting

Prometheus is not compatible with Kubernetes v1.16

I installed the stable/prometheus helm chart with some minor changes proposed at helm/charts#17268 to make it compatible with Kubernetes v1.16
After installation, none of the Kubernetes grafana dashboards show correct values. I am using 8769 (https://grafana.com/grafana/dashboards/8769) dashboard which provides many information on cpu, memory, network, etc. This dashboard is working properly on older k8s versions but on v1.16 it shows no results. I also randomly tried some other dashboards (8588, 6879, 10551) but they either just show the requested resource for each pod and not the live usage or showing nothing.
What these dashboards do is they send a promql query to prometheus and get the results. For example this is the promql query for cpu usage from 8769 dashboard:
sum (rate (container_cpu_usage_seconds_total{id!="/",namespace=~"$Namespace",pod_name=~"^$Deployment.*$"}[1m])) by (pod_name)
I don't know if I have to change the promql or the problem is somewhere else.
Kubernetes 1.16 removes the labels pod_name and container_name from
cAdvisor metrics, duplicates of pod and container.
You need change pod_name -> pod, container_name -> container in Grafana dashboards JSON models.
Try the installation this way, as the new CRDs had some issue, so I used old CRDs-
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.32/example/prometheus-operator-crd/podmonitor.crd.yaml
helm install --name prometheus --namespace monitoring stable/prometheus-operator --set prometheusOperator.createCustomResource=false
Make sure that CRD's don't exist priory, you can delete them via
kubectl delete crd --all

Hpa not fetching existing custom metric?

I'm using mongodb-exporter for store/query the metrics via prometheus. I have set up a custom metric server and storing values for that .
That is the evidence of prometheus-exporter and custom-metric-server works compatible .
Query:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitoring/pods/*/mongodb_mongod_wiredtiger_cache_bytes"
Result:
{"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitoring/pods/%2A/mongodb_mongod_wiredtiger_cache_bytes"},"items":[{"describedObject":{"kind":"Pod","namespace":"monitoring","name":"mongo-exporter-2-prometheus-mongodb-exporter-68f95fd65d-dvptr","apiVersion":"/v1"},"metricName":"mongodb_mongod_wiredtiger_cache_bytes","timestamp":"TTTTT","value":"0"}]}
In my case when I create a hpa for this custom metrics from mongo exporter, hpa return this error to me :
failed to get mongodb_mongod_wiredtiger_cache_bytes utilization: unable to get metrics for resource mongodb_mongod_wiredtiger_cache_bytes: no metrics returned from resource metrics API
What is the main issue on my case ? I have checked all configs and flow is looking fine, but where is the my mistake .
Help
Thanks :)
In comments you wrote that you have enabled external.metrics, however in original question you had issues with custom.metrics
In short:
metrics supports only basic metric like CPU or Memory.
custom.metrics allows you to extend basic metrics to all Kubernetes objects (http_requests, number of pods, etc.).
external.metrics allows to gather metrics which are not Kubernetes objects:
External metrics allow you to autoscale your cluster based on any
metric available in your monitoring system. Just provide a metric
block with a name and selector, as above, and use the External metric
type instead of Object
For more detailed description, please check this doc.
Minikube
To verify if custom.metrics are enabled you need to execute command below and check if you can see any metrics-server... pod.
$ kubectl get pods -n kube-system
...
metrics-server-587f876775-9qrtc 1/1 Running 4 5d1h
Second way is to check if minikube have enabled metrics-server by
$ minikube addons list
...
- metrics-server: enabled
If it is disabled just execute
$ sudo minikube addons enable metrics-server
✅ metrics-server was successfully enabled
GKE
Currently at GKE heapster and metrics-server are turn on as default but custom.metrics are not supported by default.
You have to install prometheus adapter or stackdriver.
Kubeadm
Kubeadm do not include heapster or metrics server at the beginning. For easy installation, you can use this YAML.
Later you have to install prometheus adapter.
Apply custom.metrics
It's the same for Minikube, Kubeadm, GKE.
Easiest way to apply custom.metrics is to install prometheus adapter via Helm.
After helm installation you will be able to see note:
NOTES:
my-release-prometheus-adapter has been deployed.
In a few minutes you should be able to list metrics using the following command(s):
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
As additional information, you can use jq to get more user friendly output.
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .