I was looking into Kubernetes Heapster and Metrics-server for getting metrics from the running pods. But the issue is, I need some custom metrics which might vary from pod to pod, and apparently Heapster only provides cpu and memory related metrics. Is there any tool already out there, which would provide me the functionality I want, or do I need to build one from scratch?
What you're looking for is application & infrastructure specific metrics. For this, the TICK stack could be helpful! Specifically Telegraf can be set up to gather detailed infrastructure metrics like Memory- and CPU pressure or even the resources used by individual docker containers, network and IO metrics etc... But it can also scrape Prometheus metrics from pods. These metrics are then shipped to influxdb and visualized using either chronograph or grafana.
Not sure if this is still open.
I would classify metrics into 3 types.
Events or Logs - System and Applications events which are sent to logs. These are non-deterministic.
Metrics - CPU and Memory utilization on the node the app is hosted. This is deterministic and are collected periodically.
APM - Applicaton Performance Monitoring metrics - these are application level metrics like requests received vs failed vs responded etc.
Not all the platforms do everything. ELK for instance does both the Metrics and Log Monitoring and does not do APM. Some of these tools have plugins into collect daemons which collect perfmon metrics of the node.
APM is a completely different area as it requires developer tool to provider metrics as Springboot does Actuator, Nodejs does AppMetrics etc. This carries the request level data. Statsd is an open source library which application can consume to provide APM metrics too Statsd agents installed in the node.
AWS offers CloudWatch agents for log shipping and sink and Xray for distributed tracing which can be used for APM.
Related
I deployed a web application on kubernetes cluster. This application consists of multiple nodes, and each node has multiple pods. How can I do the performance measurement of whole application? I can see some metrics results on prometheus/ grafana but these results for each node/pod, not for whole application. I am just trying to understand the bigger picture. For example, I see that application data is stored in redis(pod), but is it enough to look into only that pod to measure latency?
Every kubelet has cAdvisor (link) integrated into the binary. cAdvisor provides container users an understanding of the resource usage and performance characteristics of their running containers. Combine cAdvisor with an additional exporter like JMX exporter (for java, link), BlackBox exporter for probe-ing urls (response time, monitor http codes, etc. link). There are also frameworks, that provide metrics such as Java Springboot on path /actuator/prometheus and you can scrape these metrics. There are many different exporters (link), with each doing something else. When you gather all these metrics, you can have a bigger overview about the state of your application. Couple this with handmade Grafana dashboards and Alerting (AlertManager e.g.) and you can monitor almost everything about your application.
As per the prometheus/grafana stack, I guess what you are talking about is the kube-prom-stack with default dashboards already implemented into them.
I know how to get information(limit, request, usage) of cpu and memory from metrics-server.
But I don't know how to get network information(Memory Distribution, Network Traffic(KBps), Network Utilization) from metrics-server.
Essentially, metrics-server don't provide network information?
Is there another way?
I should develop with python.
Metrics server does not export these kind of metrics but Prometheus Node Exporter does. Node exporter run as a systemd service that will periodically (every 1 second) gather all the metrics of your system.
The Prometheus Node
Exporter exposes a
wide variety of hardware- and kernel-related metrics.
You can check all the list of the collectors here. You may also want to check this article about monitoring network with prometheus.
I have a cluster with 7 nodes and a lot of services, nodes, etc in the Google Cloud Platform. I'm trying to get some metrics with StackDriver Legacy, so in the Google Cloud Console -> StackDriver -> Metrics Explorer I have all the set of anthos metrics listed but when I try to create a chart based on that metrics it doesn't show the data, actually the only response that I get in the panel is no data is available for the selected time frame even changing the time frame and stuffs.
Is right to think that with anthos metrics I can retrieve information about my cronjobs, pods, services like failed initializations, jobs failures ? And if so, I can do it with StackDriver Legacy or I need to Update to StackDriver kubernetes Engine Monitoring ?
Anthos solution, includes what’s called GKE-on prem. I’d take a look at the instructions to use logging and monitoring on GKE-on prem. Stackdriver monitors GKE On-Prem clusters in a similar way as cloud-based GKE clusters.
However, there’s a note where they say that currently, Stackdriver only collects cluster logs and system component metrics. The full Kubernetes Monitoring experience will be available in a future release.
You can also check that you’ve met all the configuration requirements.
would like to see k8 Service level metrics in Grafana from underlying prometheus server.
For instance:
1) If i have 3 application pods exposed through a service i would like to see service level metrics for CPU,memory & network I/O pressure ,Total # of requests,# of requests failed
2)Also if i have group of pods(replicas) related to an application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
What would be the prometheus queries to achieve the same
Service level metrics for CPU, memory & network I/O pressure
If you have Prometheus installed on your Kubernetes cluster, all those statistics are being already collected by Prometheus. There are many good articles about how to install and how to use Kubernetes+Prometheus, try to check that one, as an example.
Here is an example of a request to fetch container memory usage:
container_memory_usage_bytes{image="CONTAINER:VERSION"}
Total # of requests,# of requests failed
Those are service-level metrics, and for collecting them, you need to use Prometheus Exporter created especially for your service. Check the list with exporters, find one which you need for your service and follow its instruction.
If you cannot find an Exporter for your application, you can write it yourself, here is an official documentation about it.
application which doesn"t have Service on top of them would like to see the aggregated metrics of the pods related to that application in a single view on grafana
It is possible to combine any graphics in a single view in Grafana using Dashboards and Panels. Check an official documentation, all that topics pretty detailed and easy to understand.
Aggregation can be done by Prometheus itself by aggregation operations.
All metrics from Kubernetes has labels, so you can group by them:
sum(http_requests_total) by (application, group), where application and group is labels.
Also, here is an official Prometheus instruction about how to add Prometheus to Grafana as a Datasourse.
According to Kubernetes Custom Metrics Proposal containers can expose its app-level metrics in Prometheus format to be collected by Heapster.
Could anyone elaborate, if metrics are pulled by Heapster that means after the container terminates metrics for the last interval are lost? Can app push metrics to Heapster instead?
Or, is there a recommended approach to collect metrics from moderately short-lived containers running in Kubernetes?
Not to speak for the original author's intent, but I believe that proposal is primarily focused on custom metrics that you want to use for things like scheduling and autoscaling within the cluster, not for general purpose monitoring (for which as you mention, pushing metrics is sometimes critical).
There isn't a single recommended pattern for what to do with custom metrics in general. If your environment has a preferred monitoring stack or vendor, a common approach is to run a second container in each pod (a "sidecar" container) to push relevant metrics about the main container to your monitoring backend.
You may want to look at handling this by sending your metrics directly from your job to a Prometheus pushgateway. This is the precise use case it was created for:
The Prometheus Pushgateway exists to allow ephemeral and batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway. The Pushgateway then exposes these metrics to Prometheus.
Prometheus developer here. If you want to monitor the metrics of applications running on Kubernetes, the approach is to have Prometheus scrape the application directly. Prometheus can auto-discover Kubernetes apps, see http://prometheus.io/docs/operating/configuration/#<kubernetes_sd_config>
There is no point in involving Heapster if you're using Prometheus, as Prometheus can do everything it does more directly.