High CPU cores usage in Kubernetes cluster (84%) - kubernetes

I've set up Prometheus and Grafana for tracking and monitoring of my Kubernetes cluster.
I've set up 3 Nodes for my cluster.
I have 26 pods running (mostly monitoring namespace).
I have one major Node app (deployment) running and right now there isn't any load.
I'm trying to understand these graph metrics. However I can't understand why there's such high CPU cores usage despite there being no load on the app.
Here's a grafana screenshot
24% memory usage I can understand as there are Kubernetes processes running as well such as kube-system etc.
And it's also telling me my cluster can support 330 pods (currently 26). I'm only worried about high cpu cores. Can anybody explain it.

82% is not the CPU usage of the processes but the ratio of requested to allocatable resources (2.31 / 2.82 = 0.819 --> ~82%).
This means that out of your 2.82 available (allocatable CPUs), you requested (allocated) about 82% for pods in the monitoring namespace but that does not mean they actually use that much CPU.
To see actual CPU usage, look at metrics like container_cpu_usage_seconds_total (per container CPU usage) or maybe even process_cpu_seconds_total (per process CPU usage).

Related

Grafana for Kubernettes shows CPU usage higher than 100%

I have 10 Kubernetes nodes (consider them as VMs) which have between 7 and 14 allocatable CPU cores which can be requested by Kubernetes pods. Therefore I'd like to show cluster CPU usage.
This is my current query
sum(kube_pod_container_resource_requests_cpu_cores{node=~"$node"}) / sum(kube_node_status_allocatable_cpu_cores{node=~"$node"})
This query shows strange results, for example over 400%.
I would like to add filter to only calculate this for nodes that have Running pods, since there might be some old node definitions which are not user. I have inherited this setup, so it is not that easy for me to wrap my head around it.
Any suggestions with a query that I can try?
Your current query is summing up CPU utilization of each nodes so it might show invalid data.
You can check CPU utilization of all pods in the cluster by running:
sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
If you want to check CPU usage of each running pod you can use using:
sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m])) by (pod_name)

Why the CPU usage of a GKE Workload is not equal to the sum of the CPU usage of its pods?

I'm trying to figure out why a GKE "Workload" CPU usage is not equivalent to the sum of cpu usage of its pods.
Following image shows a Workload CPU usage.
Service Workload CPU Usage
Following images show pods CPU usage for the above Workload.
Pod #1 CPU Usage
Pod #2 CPU Usage
For example, at 9:45, the Workload cpu usage was around 3.7 cores, but at the same time Pod#1 CPU usage was around 0.9 cores and Pod#2 CPU usage was around 0.9 cores too. It means, the service Workload CPU Usage should have been around 1.8 cores, but it wasn't.
Does anyone have an idea of this behavior?
Thanks.
On your VM, the node managed by Kubernetes, you have the deployed pods (that you manage) but also several services that run on it for the supervision, the management, the logs ingestion,... A basic description here
You can see all these basic services by performing this command kubctl get all --namespace kube-system.
If you have installed additional components, like Istio or Knative, you have additional services and namespaces. All of these get a part of the resources of the node.
Danny,
The CPU chart on the Workloads page is an aggregate of CPU usage for managed pods. The values are taken from the Stackdriver Monitoring metric container/cpu/usage_time, check this link. That metric represents "Cumulative CPU usage on all cores in seconds. This number divided by the elapsed time represents usage as a number of cores, regardless of any core limit that might be set."
Please let me know if you have further questions in regard to this.
I suspect this is a bug in the UI. There is no actual metric for deployment CPU usage. Stackdriver Monitoring only collects data on container, pod, and node level metrics thus the only really reliable metrics in this case are the ones for pod CPU usage.
The graph for the total deployment CPU usage is likely meant to be a sum of all the pods metrics calculated and then presented to you. It is not as reliable as the pod or container metrics since it is not a direct metric.
If you are seeing this discrepancy consistently, I recommend opening a UI bug report through the Google Public Issue Tracker to report this to the GCP Engineers.

Kubernetes: CPU Resource allocation for POD

I am trying to assign CPU resources for services running in kubernetes pods. Services are mostly nodejs based REST endpoints with some DB operations.
During the load test, tried different combinations between 100m and 1000m for pods.
For the expected number of requests/second, when value is less than < 500m more pods are getting spawned as part of HPA than when the value is >500m. I am using CPU based trigger for HPA.
I couldn't figure out depending on what I shall be selecting particular CPU resource value. Can someone help me in this regard?
Two points:
If you configured the HPA to autoscale based on CPU utilisation, it makes sense that there are more replicas if the CPU request is 500m than if it's 1000m. This is because the target utilisation that you define for the HPA is relative to the CPU request of the Pods.
For example, if your target utilisation is 80% and the CPU request of the Pods is 500m, then the HPA scales your app up if the actual CPU usage exceeds 400m. On the other hand, if the CPU requests are 1000m, then the HPA only scales up if the CPU usage exceeds 800m.
Selecting resource requests (e.g. CPU) for a container is very important, but it's also an art in itself. The CPU is the minimum amount of CPU that your container needs for running reliably. What you could do to find out this value is running your app locally and try to evaluate how much CPU it is actually using, for example, with ps or top.

Converting Datadog "M%" CPU Unity to Kubernetes cpu unity "m"

I have to set resource limits for my kubernetes apps, and they use the "milicore" unity "m".
When analyzing my apps in Datadog, I see a unity called M% for CPU usage.
How do I convert 1.5M% to m?
Kubernetes resources: http://kubernetes.io/docs/user-guide/compute-resources/
This is not correct graph to detect correct resource limit. You graph shows CPU usage of your app in the cluster, but resource limit is per pod (container). We (and you as well) don't know from the graph how many containers were up and running. You can determinate right CPU limit from the container CPU usage graph(s). You will need Datadog-Docker integration:
Please be aware that Kubernetes relies on Heapster to report metrics,
rather than the cgroup file directly. The collection interval for
Heapster is unknown which can lead to innacurate time-related data,
such as CPU usage. If you require more precise metrics, we recommend
using the Datadog-Docker Integration.
Then it will depends how Datadog measure CPU utilization per container. If container CPU utilization has max 100%, then 100% CPU container utilization ~ 1000m ~ 1.
I recommend you to read how and when cgroup limits CPU - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
You will need a deep knowledge to set proper CPU limits. If you don't need to prioritize any container, then IMHO the best practice is to set 1 (resources.requests.cpu) for all your containers - they will have always equal CPU times.

How to handle CPU contention for burstable k8s pods?

The use case I'm trying to get my head around takes place when you have various burstable pods scheduled on the same node. How can you ensure that the workload in a specific pod takes priority over another pod when the node's kernel is scheduling CPU and the CPU is fully burdened? In a typical Linux host my thoughts on contention between processes immediately goes to 'niceness' of the processes, however I don't see any equivalent k8s mechanism allowing for specification of CPU scheduling priority between the processes within pods on a node.
I've read of the newest capabilities provided by k8s which (if I interpret the documentation correctly) is just providing a mechanism for CPU pinning to pods which doesn't really scratch my itch. I'd still like to maximize CPU utilization by the "second class" pods if the higher priority pods don't have an active workload while allowing the higher priority workload to have CPU scheduling priority should the need arise.
So far, having not found a satisfactory answer I'm thinking that the community will opt for an architectural solution, like auto-scaling or segregating the workloads between nodes. I don't consider these to be truly addressing the issue, but really just throwing more CPUs at it which is what I'd like to avoid. Why spin up more nodes when you've got idle CPU?
Let me first explain how CPU allocation and utilization happen in k8s (memory is bit different)
You define CPU requirement as below. where we define CPU as shares of thousand.
resources:
requests:
cpu: 50m
limits:
cpu: 100m
In the above example, we ask for min 5% and max 10% of CPU shares.
Requests are used by kubernetes to schedule the pod. If a node has free CPU more than 5% only then the pod is scheduled on that node.
The limits are passed to docker(or any other runtime) which then configure cpu.shares in cgroups.
So if you Request for 5% of CPU and use only 1% then remaining are not locked to this pod and other pods can use this free CPU's to ensure that all pod gets required CPU which ensures high CPU utilization of node.
If you limit for 10% and then try to use more than that then Linux will throttle CPU uses but it won't kill pod.
So coming to your question you can set higher limits for your burstable pod and unless all pod cpu bursting at the same time you are ok. If they burst at the same time they will get equal CPU as avaliability.
you can use pod affinity-and-anti-affinity to schedule all burstable pods on a different node.
The CPU request correlates to cgroup CPU priority. Basically if Pod A has a request of 100m CPU and Pod B has 200m, even in a starvation situation B will get twice as many run seconds as A.
As already mentioned a resource management in Pods is declared with requests and limits.
There are 3 QoS Classes in Kubernetes based on requests and limits configuration:
Guaranteed (limits == requests)
Burstable (limits > requests)
Best Effort (limits and requests are unspecified)
Both of 2) and 3) might be considered as "burstable" in a sense it may consume more resources than requested.
The closest fit for your case might be using Burtstable Class for higher priority Pods and Best Effort of all other.