I deployed a web application on kubernetes cluster. This application consists of multiple nodes, and each node has multiple pods. How can I do the performance measurement of whole application? I can see some metrics results on prometheus/ grafana but these results for each node/pod, not for whole application. I am just trying to understand the bigger picture. For example, I see that application data is stored in redis(pod), but is it enough to look into only that pod to measure latency?
Every kubelet has cAdvisor (link) integrated into the binary. cAdvisor provides container users an understanding of the resource usage and performance characteristics of their running containers. Combine cAdvisor with an additional exporter like JMX exporter (for java, link), BlackBox exporter for probe-ing urls (response time, monitor http codes, etc. link). There are also frameworks, that provide metrics such as Java Springboot on path /actuator/prometheus and you can scrape these metrics. There are many different exporters (link), with each doing something else. When you gather all these metrics, you can have a bigger overview about the state of your application. Couple this with handmade Grafana dashboards and Alerting (AlertManager e.g.) and you can monitor almost everything about your application.
As per the prometheus/grafana stack, I guess what you are talking about is the kube-prom-stack with default dashboards already implemented into them.
Related
I want to know if it's possible to get metrics for the services inside the pods using Prometheus.
I don't mean monitoring the pods but the processes inside those pods. For example, containers which have apache or nginx running inside them along other main services, so I can retrieve metrics for the web server and the other main service (for example a wordpress image which aso comes with an apache configured).
The cluster already has running kube-state-metrics, node-exporter and blackbox exporter.
Is it possible? If so, how can I manage to do it?
Thanks in advance
Prometheus works by scraping an HTTP endpoint that provides the actual metrics. That's where you get the term "exporter". So if you want to get metrics from the processes running inside of pods you have three primary steps:
You must modify those processes to export the metrics you care about. This is inherently something that must be custom for each kind of application. The good news is that there are lots of pre-built ones including things like nginx and apache that you mention . Most application frameworks also have capability to export prometheus metrics. ex: Microprofile, Quarkus, and many more.
You must then modify your pod definition to expose the HTTP endpoint that those processes are now providing. Very straightfoward, but will depend on the configuration you specify for your exporters.
You must then modify your Prometheus to scrape those targets. This will depend on your monitoring stack. For Openshift you will find the docs here for enabling user workload monitoring, and here for providing exporter details.
I am thinking to use the stack Geoserver, PostGIS, Openlayers, ReactJS for my GIS project. I also plan to deploy this solution on Kubernetes with AWS.
Questions:
Assume the traffic will be : 100 requests/s -> to 1000rq/s
What is the minimum resources (vCPU, RAM) for:
- Each node K8s
- Each Geoserver (pod)
- PostGIS
Is there any formula so I can apply to have that result?
Thank you in advance
Lp Ccmu
Not really. It all depends on the footprint of all the different components of your specific application. I suggest you start small, gather a lot of metrics, and adjust.
Either grow or shrink depending on what you see on your metrics and make use of Kubernetes autoscaling tools like HPAs and the cluster autoscaler.
You can gather metrics using the AWS tools or something like Prometheus. There are many available resources on how to use Prometheus to gather Kubernetes metrics on the web.
Not really. For GeoServer it depends on the type of data and the size and complexity of the data sets as well as they styling you are applying.
You can integrate APM, Elastic and kibana to have react app to send the metrics related API endpoints requests and page hits to monitor traffic. Based on data you can adjust your deployment's resources.
You can see this post for GIS stack on Kubernetes.
https://link.medium.com/r645NGwpejb
I have a simple java based application deployed in Kubernetes. I want to get the average latency of requests sent to the application(GET and POST).
Stackdriver Monitoring API has the latency details of loadbalancer. But that can only be collected after 210 seconds which is not sufficient in my case. How can I configure in Kubernetes to get the latency details every 30 seconds (or 1 minute) immediately.
I wish the solution to be independent of Java so that I can use it for any application I deploy.
On GKE, you can use Stackdriver Trace, which is GCP specific. I am currently fighting with python client library. Hopefully Java is more mature.
Or you can use Jaeger, which is CNCF project.
Use a Service Mesh
A Service Mesh will let you observe things like latency between your services without extra code for this in each applications. Istio is such an implementation that is available on Google Kubernetes Engine.
Get uniform metrics and traces from any running applications without requiring developers to manually instrument their applications.
Istio’s monitoring capabilities let you understand how service performance impacts things upstream and downstream
See Istio on GCP
use a service mesh: software that helps you orchestrate, secure, and collect telemetry across distributed applications. A service mesh transparently oversees and monitors all traffic for your application, typically through a set of network proxies that sit alongside each microservice.
Welcome to the service mesh era
I was looking into Kubernetes Heapster and Metrics-server for getting metrics from the running pods. But the issue is, I need some custom metrics which might vary from pod to pod, and apparently Heapster only provides cpu and memory related metrics. Is there any tool already out there, which would provide me the functionality I want, or do I need to build one from scratch?
What you're looking for is application & infrastructure specific metrics. For this, the TICK stack could be helpful! Specifically Telegraf can be set up to gather detailed infrastructure metrics like Memory- and CPU pressure or even the resources used by individual docker containers, network and IO metrics etc... But it can also scrape Prometheus metrics from pods. These metrics are then shipped to influxdb and visualized using either chronograph or grafana.
Not sure if this is still open.
I would classify metrics into 3 types.
Events or Logs - System and Applications events which are sent to logs. These are non-deterministic.
Metrics - CPU and Memory utilization on the node the app is hosted. This is deterministic and are collected periodically.
APM - Applicaton Performance Monitoring metrics - these are application level metrics like requests received vs failed vs responded etc.
Not all the platforms do everything. ELK for instance does both the Metrics and Log Monitoring and does not do APM. Some of these tools have plugins into collect daemons which collect perfmon metrics of the node.
APM is a completely different area as it requires developer tool to provider metrics as Springboot does Actuator, Nodejs does AppMetrics etc. This carries the request level data. Statsd is an open source library which application can consume to provide APM metrics too Statsd agents installed in the node.
AWS offers CloudWatch agents for log shipping and sink and Xray for distributed tracing which can be used for APM.
would like to see k8 Service level metrics in Grafana from underlying prometheus server.
For instance:
1) If i have 3 application pods exposed through a service i would like to see service level metrics for CPU,memory & network I/O pressure ,Total # of requests,# of requests failed
2)Also if i have group of pods(replicas) related to an application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
What would be the prometheus queries to achieve the same
Service level metrics for CPU, memory & network I/O pressure
If you have Prometheus installed on your Kubernetes cluster, all those statistics are being already collected by Prometheus. There are many good articles about how to install and how to use Kubernetes+Prometheus, try to check that one, as an example.
Here is an example of a request to fetch container memory usage:
container_memory_usage_bytes{image="CONTAINER:VERSION"}
Total # of requests,# of requests failed
Those are service-level metrics, and for collecting them, you need to use Prometheus Exporter created especially for your service. Check the list with exporters, find one which you need for your service and follow its instruction.
If you cannot find an Exporter for your application, you can write it yourself, here is an official documentation about it.
application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
It is possible to combine any graphics in a single view in Grafana using Dashboards and Panels. Check an official documentation, all that topics pretty detailed and easy to understand.
Aggregation can be done by Prometheus itself by aggregation operations.
All metrics from Kubernetes has labels, so you can group by them:
sum(http_requests_total) by (application, group), where application and group is labels.
Also, here is an official Prometheus instruction about how to add Prometheus to Grafana as a Datasourse.