How to monitor windows manchine in grafana using prometheus? - grafana

I am monitoring windows machine and i installed wmi exporter in my machine. I am using prometheus and grafana as monitoring tools. which query i should use to monitor the CPU status of my windows machine

This gets you the percentege of CPU use.
100 - (avg by (instance) (irate(wmi_cpu_time_total{mode="idle", instance=~"$server.*"}[1m])) * 100)

I don't have a WMI exporter running , but according to its documentation something like this should work with a stacked graph:
sum by(mode) (rate(wmi_cpu_time_total[5m]))
You can add labels to the metric to filter by instance / job / whatever and you can tweak the range that you compute the rate over (e.g. 1m for less smoothing; 1h over longer ranges of time; or Grafana's $__interval for dashboard range + screen resolution dependent graphing).
Edit: the query above would give you CPU usage in absolute terms, i.e. if your machine had 4 cores, the stacked graph would add up to (approximately) 4 or 400%. If you want it to instead add up to exactly 100% you should use something like this (not tested):
sum by(mode) (rate(wmi_cpu_time_total[5m]))
/
scalar(sum(rate(wmi_cpu_time_total[5m]))
All it does is it divides each per-CPU-mode value by their sum, so the results will always sum up to 1. All you need to do in Grafana is select the unit of measurement to be "percentage (0-1)".

Related

Dashboard visualizing CPU usage of kafka container chopped up

I want to monitor the cpu usage of kafka container, but the graph is chopped up into different pieces. There seem to be gaps in the graph and after each gap a different colored line follows. The time range is last 30 days. For the exporter we use danielqsj/kafka-exporter:v1.4.2
The promql query used to create this graph is:
rate(container_cpu_usage_seconds_total{container="cp-kafka-broker"}[1m])
Can I merge these lines into one continual? If so, with what promql expression/dashboard configuration?
This happens when at least 1 of the labels that are attached to the metric changes. The rate function keeps all the original labels from the underline time series. In Prometheus, each time series is uniquely identified by the metric name container_cpu_usage_seconds_total and any labels (key-value pairs) attached to the metric (container, for instance). This is why Grafana uses different colors because they are different time series.
If you want to get a single series in Grafana you can aggregate using the sum operator:
sum(rate(container_cpu_usage_seconds_total{container="cp-kafka-broker"}[1m]))
which by default will not keep any of the original labels.

Average number of cores and memory used per namespace in K8s?

Need to calculate the average number of cores and memory used over a period of time (like say a month) for a particular namespace in K8s, how can we go about doing this?
We want to calculate the cost for each namespace, we did try the Kubecost tool in AKS, but it didn't match the cost shown on the Azure Cost dashboard, in fact, it was way more than the actual cost.

Horizontal-Pod-Autoscale scale only if CPU load is remain constant for given (5 min) duration

I have a k8s cluster deployed in AWS's EKS. Using Kubernetes 1.14 version
Horizontal-Pod-Autoscale scale only if CPU load is remain constant for given (5 min) duration
As we want to take decision after 4-5 mins if load remain high during that duration.
if load reduces after 3-4 mins then don't scale up, but currently we are not able to find any way for that.
horizontal-pod-autoscaler-upscale-delay is deprecated.
So we are looking for parameter by which, we can set CPU usage duration for HPA.
horizontal-pod-autoscaler-upscale-delay is removed. It might still work. You can add it to kube-controller arguments and check

Calculate value in Group By statement

Use case:
I have 10 Kubernetes nodes (consider them as VMs) which have between 7 and 14 allocatable CPU cores which can be requested by Kubernetes pods. Therefore I'd like to show a table which shows the
Allocatable CPU cores
The requested CPU cores
The ratio of requested / allocatable CPU cores
grouped by node.
The problem
Creating the table for the first 2 requirements was easy. I simply created a table in Grafana and added these two metrics:
sum(kube_pod_container_resource_requests_cpu_cores) by (node)
sum(kube_node_status_allocatable_cpu_cores) by (node)
However I was struggling with the third one. I tried this query, but it didn't return any data apparently:
sum(kube_pod_container_resource_requests_cpu_cores / kube_node_status_allocatable_cpu_cores) by (node)
Question
How can I achieve a calculation of two different metrics in a group by statement in my given example?
The issue here is that the two have different labels, so you need to aggregate away the extras:
sum by (node)(kube_pod_container_resource_requests_cpu_cores)
/
sum by (node)(kube_node_status_allocatable_cpu_cores)

Grafana aggregation issue when changing time range (%CPU and more)

I have an % CPU usage grafana graph.
The problem is that the source data is collected by collectd as Jiffies.
I am using the following formula:
collectd|<ServerName>|cpu-*|cpu-idle|value|nonNegativeDerivative()|asPercent(-6000)|offset(100)
The problem is that when I increase the time range (to 30 days for example), the grafana is aggregating the data and since it is accumulative numbers (And not percentage or something it can make a simple average), the data in the graph is becoming invalid.
Any idea how to create a better formula?
Have you looked at the aggregation plugin (read type) to compute averages?
https://collectd.org/wiki/index.php/Plugin:Aggregation/Config
it is very strange that you have to use the nonNegativeDerivative function for a CPU metric. nonNegativeDerivative should only be used for ever increasing counters, not a gauge like metric like CPU