would like to see k8 Service level metrics in Grafana from underlying prometheus server.
For instance:
1) If i have 3 application pods exposed through a service i would like to see service level metrics for CPU,memory & network I/O pressure ,Total # of requests,# of requests failed
2)Also if i have group of pods(replicas) related to an application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
What would be the prometheus queries to achieve the same
Service level metrics for CPU, memory & network I/O pressure
If you have Prometheus installed on your Kubernetes cluster, all those statistics are being already collected by Prometheus. There are many good articles about how to install and how to use Kubernetes+Prometheus, try to check that one, as an example.
Here is an example of a request to fetch container memory usage:
container_memory_usage_bytes{image="CONTAINER:VERSION"}
Total # of requests,# of requests failed
Those are service-level metrics, and for collecting them, you need to use Prometheus Exporter created especially for your service. Check the list with exporters, find one which you need for your service and follow its instruction.
If you cannot find an Exporter for your application, you can write it yourself, here is an official documentation about it.
application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
It is possible to combine any graphics in a single view in Grafana using Dashboards and Panels. Check an official documentation, all that topics pretty detailed and easy to understand.
Aggregation can be done by Prometheus itself by aggregation operations.
All metrics from Kubernetes has labels, so you can group by them:
sum(http_requests_total) by (application, group), where application and group is labels.
Also, here is an official Prometheus instruction about how to add Prometheus to Grafana as a Datasourse.
Related
I want to know if it's possible to get metrics for the services inside the pods using Prometheus.
I don't mean monitoring the pods but the processes inside those pods. For example, containers which have apache or nginx running inside them along other main services, so I can retrieve metrics for the web server and the other main service (for example a wordpress image which aso comes with an apache configured).
The cluster already has running kube-state-metrics, node-exporter and blackbox exporter.
Is it possible? If so, how can I manage to do it?
Thanks in advance
Prometheus works by scraping an HTTP endpoint that provides the actual metrics. That's where you get the term "exporter". So if you want to get metrics from the processes running inside of pods you have three primary steps:
You must modify those processes to export the metrics you care about. This is inherently something that must be custom for each kind of application. The good news is that there are lots of pre-built ones including things like nginx and apache that you mention . Most application frameworks also have capability to export prometheus metrics. ex: Microprofile, Quarkus, and many more.
You must then modify your pod definition to expose the HTTP endpoint that those processes are now providing. Very straightfoward, but will depend on the configuration you specify for your exporters.
You must then modify your Prometheus to scrape those targets. This will depend on your monitoring stack. For Openshift you will find the docs here for enabling user workload monitoring, and here for providing exporter details.
I am fairly new to Kubernetes and had a question concerning kube-state-metrics. When I simply monitor Kubernetes using Prometheus I obtain a set of metrics from the cAdvisor, the nodes (node exporter), the pods, etc. When I include the kube-state-metrics, I seem to obtain more "relevant" metrics. Do kube-state-metrics allow to scrape "new" information from Kubernetes or are they rather "formatted" metrics using the initial Kubernetes metrics (from the nodes, etc. I mentioned earlier).
The two are basically unrelated. Cadvisor is giving you low-level stats about the containers like how much RAM and CPU they are using. KSM gives you info from the Kubernetes API like the Pod object status. Both are useful for different things and you probably want both.
I have two Kubernetes clusters representing dev and staging environments.
Separately, I am also deploying a custom DevOps dashboard which will be used to monitor these two clusters. On this dashboard I will need to show information such as:
RAM/HD Space/CPU usage of each deployed Pod in each environment
Pod health (as in if it has too many container restarts etc)
Pod uptime
All these stats have to be at a cluster level and also per namespace, preferably. As in, if I query a for a particular namespace, I have to get all the resource usages of that namespace.
So the webservice layer of my dashboard will send a service request to the master node of my respective cluster in order to fetch this information.
Another thing I need is to implement real time notifications in my DevOps dashboard. Every time a container fails, I need to catch that event and notify relevant personnel.
I have been reading around and two things that pop up a lot are Prometheus and Metric Server. Do I need both or will one do? I set up Prometheus on a local cluster but I can't find any endpoints it exposes which could be called by my dashboard service. I'm also trying to set up Prometheus AlertManager but so far it hasn't worked as expected. Trying to fix it now. Just wanted to check if these technologies have the capabilities to meet my requirements.
Thanks!
I don't know why you are considering your own custom monitoring system. Prometheus operator provides all the functionality that you mentioned.
You will end up only with your own grafana dashboard with all required information.
If you need custom notification you can set it up in Alertmanager creating correct prometheusrules.monitoring.coreos.com, you can find a lot of preconfigured prometheusrules in kubernetes-mixin
.
Using labels and namespaces in Alertmanager you can setup a correct route to notify person responsible for a given deployment.
Do I need both or will one do?, yes, you need both - Prometheus collects and aggregates metric when Metrick server exposes metrics from your cluster node for your Prometheus to scrape it.
If you have problems with Prometheus, Alertmanger and so on consider using helm chart as entrypoint.
Prometheus + Grafana are a pretty standard setup.
Installing kube-prometheus or prometheus-operator via helm will give you
Grafana, Alertmanager, node-exporter and kube-state-metrics by default and all be setup for kubernetes metrics.
Configure alertmanager to do something with the alerts. SMTP is usually the first thing setup but I would recommend some sort of event manager if this is a service people need to rely on.
Although a dashboard isn't part of your requirements, this will inform how you can connect into prometheus as a data source. There is docco on adding prometheus data source for grafana.
There are a number of prebuilt charts available to add to Grafana. There are some charts to visualise alertmanager too.
Your external service won't be querying the metrics directly with prometheus, in will be querying the collected data in prometheus stored inside your cluster. To access the API externally you will need to setup an external path to the prometheus service. This can be configured via an ingress controller in the helm deployment:
prometheus.ingress.enabled: true
You can do the same for the alertmanager API and grafana if needed.
alertmanager.ingress.enabled: true
grafana.ingress.enabled: true
You could use Grafana outside the cluster as your dashboard via the same prometheus ingress if it proves useful.
I was looking into Kubernetes Heapster and Metrics-server for getting metrics from the running pods. But the issue is, I need some custom metrics which might vary from pod to pod, and apparently Heapster only provides cpu and memory related metrics. Is there any tool already out there, which would provide me the functionality I want, or do I need to build one from scratch?
What you're looking for is application & infrastructure specific metrics. For this, the TICK stack could be helpful! Specifically Telegraf can be set up to gather detailed infrastructure metrics like Memory- and CPU pressure or even the resources used by individual docker containers, network and IO metrics etc... But it can also scrape Prometheus metrics from pods. These metrics are then shipped to influxdb and visualized using either chronograph or grafana.
Not sure if this is still open.
I would classify metrics into 3 types.
Events or Logs - System and Applications events which are sent to logs. These are non-deterministic.
Metrics - CPU and Memory utilization on the node the app is hosted. This is deterministic and are collected periodically.
APM - Applicaton Performance Monitoring metrics - these are application level metrics like requests received vs failed vs responded etc.
Not all the platforms do everything. ELK for instance does both the Metrics and Log Monitoring and does not do APM. Some of these tools have plugins into collect daemons which collect perfmon metrics of the node.
APM is a completely different area as it requires developer tool to provider metrics as Springboot does Actuator, Nodejs does AppMetrics etc. This carries the request level data. Statsd is an open source library which application can consume to provide APM metrics too Statsd agents installed in the node.
AWS offers CloudWatch agents for log shipping and sink and Xray for distributed tracing which can be used for APM.
According to Kubernetes Custom Metrics Proposal containers can expose its app-level metrics in Prometheus format to be collected by Heapster.
Could anyone elaborate, if metrics are pulled by Heapster that means after the container terminates metrics for the last interval are lost? Can app push metrics to Heapster instead?
Or, is there a recommended approach to collect metrics from moderately short-lived containers running in Kubernetes?
Not to speak for the original author's intent, but I believe that proposal is primarily focused on custom metrics that you want to use for things like scheduling and autoscaling within the cluster, not for general purpose monitoring (for which as you mention, pushing metrics is sometimes critical).
There isn't a single recommended pattern for what to do with custom metrics in general. If your environment has a preferred monitoring stack or vendor, a common approach is to run a second container in each pod (a "sidecar" container) to push relevant metrics about the main container to your monitoring backend.
You may want to look at handling this by sending your metrics directly from your job to a Prometheus pushgateway. This is the precise use case it was created for:
The Prometheus Pushgateway exists to allow ephemeral and batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway. The Pushgateway then exposes these metrics to Prometheus.
Prometheus developer here. If you want to monitor the metrics of applications running on Kubernetes, the approach is to have Prometheus scrape the application directly. Prometheus can auto-discover Kubernetes apps, see http://prometheus.io/docs/operating/configuration/#<kubernetes_sd_config>
There is no point in involving Heapster if you're using Prometheus, as Prometheus can do everything it does more directly.