Best practices when trying to implement custom Kubernetes monitoring system - kubernetes

I have two Kubernetes clusters representing dev and staging environments.
Separately, I am also deploying a custom DevOps dashboard which will be used to monitor these two clusters. On this dashboard I will need to show information such as:
RAM/HD Space/CPU usage of each deployed Pod in each environment
Pod health (as in if it has too many container restarts etc)
Pod uptime
All these stats have to be at a cluster level and also per namespace, preferably. As in, if I query a for a particular namespace, I have to get all the resource usages of that namespace.
So the webservice layer of my dashboard will send a service request to the master node of my respective cluster in order to fetch this information.
Another thing I need is to implement real time notifications in my DevOps dashboard. Every time a container fails, I need to catch that event and notify relevant personnel.
I have been reading around and two things that pop up a lot are Prometheus and Metric Server. Do I need both or will one do? I set up Prometheus on a local cluster but I can't find any endpoints it exposes which could be called by my dashboard service. I'm also trying to set up Prometheus AlertManager but so far it hasn't worked as expected. Trying to fix it now. Just wanted to check if these technologies have the capabilities to meet my requirements.
Thanks!

I don't know why you are considering your own custom monitoring system. Prometheus operator provides all the functionality that you mentioned.
You will end up only with your own grafana dashboard with all required information.
If you need custom notification you can set it up in Alertmanager creating correct prometheusrules.monitoring.coreos.com, you can find a lot of preconfigured prometheusrules in kubernetes-mixin
.
Using labels and namespaces in Alertmanager you can setup a correct route to notify person responsible for a given deployment.
Do I need both or will one do?, yes, you need both - Prometheus collects and aggregates metric when Metrick server exposes metrics from your cluster node for your Prometheus to scrape it.
If you have problems with Prometheus, Alertmanger and so on consider using helm chart as entrypoint.

Prometheus + Grafana are a pretty standard setup.
Installing kube-prometheus or prometheus-operator via helm will give you
Grafana, Alertmanager, node-exporter and kube-state-metrics by default and all be setup for kubernetes metrics.
Configure alertmanager to do something with the alerts. SMTP is usually the first thing setup but I would recommend some sort of event manager if this is a service people need to rely on.
Although a dashboard isn't part of your requirements, this will inform how you can connect into prometheus as a data source. There is docco on adding prometheus data source for grafana.
There are a number of prebuilt charts available to add to Grafana. There are some charts to visualise alertmanager too.
Your external service won't be querying the metrics directly with prometheus, in will be querying the collected data in prometheus stored inside your cluster. To access the API externally you will need to setup an external path to the prometheus service. This can be configured via an ingress controller in the helm deployment:
prometheus.ingress.enabled: true
You can do the same for the alertmanager API and grafana if needed.
alertmanager.ingress.enabled: true
grafana.ingress.enabled: true
You could use Grafana outside the cluster as your dashboard via the same prometheus ingress if it proves useful.

Related

Monitoring inside Pods with Prometheus

I want to know if it's possible to get metrics for the services inside the pods using Prometheus.
I don't mean monitoring the pods but the processes inside those pods. For example, containers which have apache or nginx running inside them along other main services, so I can retrieve metrics for the web server and the other main service (for example a wordpress image which aso comes with an apache configured).
The cluster already has running kube-state-metrics, node-exporter and blackbox exporter.
Is it possible? If so, how can I manage to do it?
Thanks in advance
Prometheus works by scraping an HTTP endpoint that provides the actual metrics. That's where you get the term "exporter". So if you want to get metrics from the processes running inside of pods you have three primary steps:
You must modify those processes to export the metrics you care about. This is inherently something that must be custom for each kind of application. The good news is that there are lots of pre-built ones including things like nginx and apache that you mention . Most application frameworks also have capability to export prometheus metrics. ex: Microprofile, Quarkus, and many more.
You must then modify your pod definition to expose the HTTP endpoint that those processes are now providing. Very straightfoward, but will depend on the configuration you specify for your exporters.
You must then modify your Prometheus to scrape those targets. This will depend on your monitoring stack. For Openshift you will find the docs here for enabling user workload monitoring, and here for providing exporter details.

How to monitor a container running db2 image using Prometheus and also react app using Prometheus?

I have to build a monitoring solution using Prometheus and Graphana for a service which is built using React(front end)+ Node js + Db2(containerised) . I have no idea where to start,can someone suggest me the resources where to learn?Thank you.
First of all, you need to install Prometheus and Grafana in your Kubernetes cluster following the instructions given for each:
Prometheus: https://prometheus.io/docs/prometheus/latest/installation/
Grafana: https://grafana.com/docs/grafana/latest/installation/
Next, you need to understand that Prometheus is a pull-based metrics collection system. It retrieves metrics from configured targets (endpoints) at given intervals and displays the results.
You can setup the working monitoring system by implementing the below steps:
Instrument your application code for Prometheus to be able to scrape metric from -
For this, you need to add instrumentation to the code via one of the supported Prometheus client libraries.
Configure Prometheus to scrape the metrics exposed by the service - Prometheus supports a K8s custom resource named ServiceMonitor introduced by the Prometheus Operator that can be used to configure Prometheus to scrape the metric defined in step 1.
Observe the scraped metrics - Next, you can observe the defined metric in either the Prometheus UI or Grafana UI by configuring Grafana support for Prometheus.

How to supply external metrics into HPA?

Problem setting. Suppose I have 2 pods, A and B. I want to be able to dynamically scale pod A based on some arbitrary number from some arbitrary source. Suppose that pod B is such a source: for example, it can have an HTTP server with an endpoint which responds with the number of desired replicas of pod A at the moment of request. Or maybe it is an ES server or a SQL DB (does not matter).
Question. What kubernetes objects do I need to define to achieve this (apart from HPA)? What configuration should HPA have to know that it needs to look up B for current metric? How should API of B look like (or is there any constraints?)?
Research I have made. Unfortunately, the official documentation does not say much about it, apart from declaring that there is such a possibility. There are also two repositories, one with some go boilerplate code that I have trouble building and another one that has no usage instructions whatsoever (though allegedly does fulfil the "external metrics over HTTP" requirement).
By having a look at the .yaml configs in those repositories, I have reached a conclusion that apart from Deployment and Service one needs to define an APIService object that registers the external or custom metric in the kubernetes API and links it with a normal service (where you would have your pod) and a handful of ClusterRole and ClusterRoleBinding objects. But there is no explanation about it. Also I could not even list existing APIServices with kubectl in my local cluster (of 1.15 version) like other objects.
The easiest way will be to feed metrics into Prometheus (which is a commonly solved problem), and then setup a Prometheus-based HPA (also a commonly solved problem).
1. Feed own metrics to Prometheus
Start with Prometheus-Operator to get the cluster itself monitored, and get access to ServiceMonitor objects. ServiceMonitors are pointers to services in the cluster. They let your pod's /metrics endpoint be discovered and scraped by a prometheus server.
Write a pod that reads metrics from your 3rd party API and shows them in own /metrics endpoint. This will be the adapter between your API and Prometheus format. There are clients of course: https://github.com/prometheus/client_python#exporting
Write a Service of type ClusterIP that represents your pod.
Write a ServiceMonitor that points to a service.
Query your custom metrics thru Prometheus dashboard to ensure this stage is done.
2. Setup Prometheus-based HPA
Setup Prometheus-Adapter and follow the HPA walkthrough.
Or follow the guide https://github.com/stefanprodan/k8s-prom-hpa
This looks like a huge pile of work to get the HPA. However, only the adapter pod is a custom part here. Everything else is a standard stack setup in most of the clusters, and you will get many other use cases for it anyways.

Prometheus Adapter configuration for kubernetes metrics

I installed prometheus-adapter with helm.
Now I don't know how to configure prometheus-adapter so that my kubernetes cluster can communicate with a extern server where prometheus is installed.
Where and how can i connect the prometheus-adapter to prometheus.
I want to use data from prometheus for my external metrics in kubernetes.
First, you'll need to deploy the Prometheus Operator.
This walkthrough assumes that Prometheus is deployed in the prom namespace. Most of the sample commands and files are namespace-agnostic, but there are a few commands or pieces of configuration that rely on that namespace. If you're using a different namespace, simply substitute that in for prom when it appears.
Note that if you are deploying on a non-x86_64 (amd64) platform, you'll need to change the image field in the Deployment to be the appropriate image for your platform.
Make sure that you have default adapter which configuration should work with standard Prometheus Operator configuration, but if you've got custom relabelling rules, or your labels above weren't exactly namespace and pod, you may need to edit the configuration in the ConfigMap. The configuration walkthrough provides an overview of how configuration works.
Make sure that you have registered the API with the API aggregator (part of the main Kubernetes API server).
Try fetching the discovery information for it:
$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
Since you've set up Prometheus to collect your app's metrics, you should see a pods/http_request resource show up. This represents the http_requests_total metric, converted into a rate, aggregated to have one datapoint per pod. Notice that this translates to the same API that our HorizontalPodAutoscaler was trying to use above.
The API is registered as custom.metrics.k8s.io/v1beta1, and you can find more information about aggregation at Concepts: Aggregation.
More information you can find in this instruction.
Let me know if it helps.
if you just want to communicate between prometheus-adapter and prometheus, you need to mount prometheus service url prometheus-adapter, so that prometheus-adapter will know where to grab the metric.
the default prometheus service url is http://prometheus.svc:9090 . you need to figure out what is your prometheus service url.

Service Level metrics Prometheus in k8

would like to see k8 Service level metrics in Grafana from underlying prometheus server.
For instance:
1) If i have 3 application pods exposed through a service i would like to see service level metrics for CPU,memory & network I/O pressure ,Total # of requests,# of requests failed
2)Also if i have group of pods(replicas) related to an application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
What would be the prometheus queries to achieve the same
Service level metrics for CPU, memory & network I/O pressure
If you have Prometheus installed on your Kubernetes cluster, all those statistics are being already collected by Prometheus. There are many good articles about how to install and how to use Kubernetes+Prometheus, try to check that one, as an example.
Here is an example of a request to fetch container memory usage:
container_memory_usage_bytes{image="CONTAINER:VERSION"}
Total # of requests,# of requests failed
Those are service-level metrics, and for collecting them, you need to use Prometheus Exporter created especially for your service. Check the list with exporters, find one which you need for your service and follow its instruction.
If you cannot find an Exporter for your application, you can write it yourself, here is an official documentation about it.
application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
It is possible to combine any graphics in a single view in Grafana using Dashboards and Panels. Check an official documentation, all that topics pretty detailed and easy to understand.
Aggregation can be done by Prometheus itself by aggregation operations.
All metrics from Kubernetes has labels, so you can group by them:
sum(http_requests_total) by (application, group), where application and group is labels.
Also, here is an official Prometheus instruction about how to add Prometheus to Grafana as a Datasourse.