Kubernetes - Sum of Pods CPU Utilization Limit - kubernetes

We have pods running with same label in multiple namespaces. Cpu utilization of all pods should not cross licensed vCPU,say X-vCPUR
We do not want to limit through SUM(POD cpu Limit) < X-vCPU as we want to give flexibility to apps to burst, as license constraints is on sum of all pod utilization. Limiting through CPU Limit ,reduces number of pods that can be deployed and all pods will not burst at same time.
How can we achieve this? any insights help full, thanks in advance.
Note: There are other applications running with different labels, which does not consider for core usage.
Limiting through CPU limit
Nodes start up ,these apps burst at same time and takes more vCpu,we would like to limit combined pods utilization burst to below a specified limit.

Related

How is CPU usage calculated in Grafana?

Here's an image taken from Grafana which shows the CPU usage, requests and limits, as well as throttling, for a pod running in a K8s cluster.
As you can see, the CPU usage of the pod is very low compared to the requests and limits. However, there is around 25% of CPU throttling.
Here are my questions:
How is the CPU usage (yellow) calculated?
How is the ~25% CPU throttling (green) calculated?
How is it possible that the CPU throttles when the allocated resources for the pod are so much higher than the usage?
Extra info:
The yellow is showing container_cpu_usage_seconds_total.
The green is container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total

What Kubernetes uses to calculate the CPU ratio, request or limit?

When you specify and Horizontal Pod Autoscaler in Kubernetes for example with targetCPUUtilizationPercentage of 50, what does Kubernetes use to calculate the CPU ratio, the request or the limit of the container?
So for example, with a request=250 and limit=500 and you want to scale up when is half its limit:
If it used the request, I would put the target to 100% at least as it can raise to 200%.
If it used the limit, I would use target = 50% as 100% would mean the limit is reached.
targetCPUUtilizationPercentage of 50 means that if average CPU utilization across all Pods goes up above 50% then HPA would scale up the deployment and if the average CPU utilization across all Pods goes below 50% then HPA would scale down the deployment if the number of replicas are more than 1
I just checked the code and found that targetUtilization percentage calculation uses resource request.
refer below code
currentUtilization = int32((metricsTotal * 100) / requestsTotal)
here is the link
https://github.com/kubernetes/kubernetes/blob/v1.9.0/pkg/controller/podautoscaler/metrics/utilization.go#L49

How to make Horizontal Pod Autoscaler scale down pod replicas on a percentage decrease threshold?

I am looking for a syntax/condition of percentage decrease threshold to be inserted in HPA.yaml file which would allow the Horizontal Pod Autoscaler to start decreasing the pod replicas when the CPU utilization falls that particular percentage threshold.
Consider this scenario:-
I mentioned an option targetCPUUtilizationPercentage and assigned it with value 50. minReplicas to be 1 and MaxReplicas to be 5.
Now lets assume the CPU utilization went above 50, and went till 100, making the HPA to create 2 replicas. If the utilization decreases to 51% also, HPA will not terminate 1 pod replica.
Is there any way to conditionize the scale down on the basis of % decrease in CPU utilization?
Just like targetCPUUtilizationPercentage, I could be able to mention targetCPUUtilizationPercentageDecrease and assign it value 30, so that when the CPU utilization falls from 100% to 70%, HPA terminates a pod replica and further 30% decrease in CPU utilization, so that when it reaches 40%, the other remaining pod replica gets terminated.
As per on-line resources, this topic is still under community progress "Configurable HorizontalPodAutoscaler options"
I didn't try but as workaround you can try to create custom metrics f.e. using Prometheus Adapter, Horizontal pod auto scaling by using custom metrics
in order to have more control about provided limits.
At the moment you can use horizontal-pod-autoscaler-downscale-stabilization:
--horizontal-pod-autoscaler-downscale-stabilization option to control
The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
On the other point of view this is expected due to the basis of HPA:
Applications that process very important data events. These should scale up as fast as possible (to reduce the data processing time), and scale down as soon as possible (to reduce cost).
Hope this help.

Query on kubernetes metrics-server metrics values

I am using metrics-server(https://github.com/kubernetes-incubator/metrics-server/) to collect the core metrics from containers in a kubernetes cluster.
I could fetch 2 resource usage metrics per container.
cpu usage
memory usage
However its not clear to me whether
these metrics are accumulated over time or they are already sampled for a particular time window(1 minute/ 30 seconds..)
What are the units for the above metric values. For CPU usage, is it the number of cores or milliseconds? And for memory usage i assume its the bytes usage.
While computing CPU usage metric value, does metrics-server already take care of dividing the container usage by the host system usage?
Also, if i have to compare these metrics with the docker-api metrics, how to compute CPU usage % for a given container?
Thanks!
Metrics are scraped periodically from kubelets. The default resolution duration is 60s, which can be overriden with the --metric-resolution=<duration> flag.
The value and unit (cpu - cores in decimal SI, memory - bytes in binary SI) are arrived at by using the Quantity serializer in the k8s apimachinery package. You can read about it from the comments in the source code
No, the CPU metric is not relative to the host system usage as you can see that it's not a percentage value. It represents the rate of change of the total amount of CPU seconds consumed by the container by core. If this value increases by 1 within one second, the pod consumes 1 CPU core (or 1000 milli-cores) in that second.
To arrive at a relative value, depending on your use case, you can divide the CPU metric for a pod by that for the node, since metrics-server exposes both /pods and /nodes endpoints.

Pods and nodes Cpu usages in kubernates

Is Total cpu usages of a node sum up of all pods running in specific nodes.what is the relation between millicpu in cpu cores and % of cpu usages of node.Is request and limit control the cpu usages of pods if so then if a pods cpu usages reach its limit then it will be killed and move other node or continues execution in similar node with maximum limit.
Millicores is an absolute number (one core divided by 1000). A given node typically has multiple cores, so the relation between the number of millicores and the total percentage varies. For example 1000 millicores (one core) would be 25% on a four core node, but 50% on a node with two cores.
Request determines how much cpu a pod is guaranteed. It will not be scheduled on a node unless the node can deliver that much.
Limit determines how much a pod can get. It will not be killed or moved if it exceeds the limit; it is simply not allowed to exceed the limit.