Query on kubernetes metrics-server metrics values - kubernetes

I am using metrics-server(https://github.com/kubernetes-incubator/metrics-server/) to collect the core metrics from containers in a kubernetes cluster.
I could fetch 2 resource usage metrics per container.
cpu usage
memory usage
However its not clear to me whether
these metrics are accumulated over time or they are already sampled for a particular time window(1 minute/ 30 seconds..)
What are the units for the above metric values. For CPU usage, is it the number of cores or milliseconds? And for memory usage i assume its the bytes usage.
While computing CPU usage metric value, does metrics-server already take care of dividing the container usage by the host system usage?
Also, if i have to compare these metrics with the docker-api metrics, how to compute CPU usage % for a given container?
Thanks!

Metrics are scraped periodically from kubelets. The default resolution duration is 60s, which can be overriden with the --metric-resolution=<duration> flag.
The value and unit (cpu - cores in decimal SI, memory - bytes in binary SI) are arrived at by using the Quantity serializer in the k8s apimachinery package. You can read about it from the comments in the source code
No, the CPU metric is not relative to the host system usage as you can see that it's not a percentage value. It represents the rate of change of the total amount of CPU seconds consumed by the container by core. If this value increases by 1 within one second, the pod consumes 1 CPU core (or 1000 milli-cores) in that second.
To arrive at a relative value, depending on your use case, you can divide the CPU metric for a pod by that for the node, since metrics-server exposes both /pods and /nodes endpoints.

Related

Kubernetes - Sum of Pods CPU Utilization Limit

We have pods running with same label in multiple namespaces. Cpu utilization of all pods should not cross licensed vCPU,say X-vCPUR
We do not want to limit through SUM(POD cpu Limit) < X-vCPU as we want to give flexibility to apps to burst, as license constraints is on sum of all pod utilization. Limiting through CPU Limit ,reduces number of pods that can be deployed and all pods will not burst at same time.
How can we achieve this? any insights help full, thanks in advance.
Note: There are other applications running with different labels, which does not consider for core usage.
Limiting through CPU limit
Nodes start up ,these apps burst at same time and takes more vCpu,we would like to limit combined pods utilization burst to below a specified limit.

Kubernetes pod resource limiting/quotas as a percentage of host resources (relative) rather than using absolute values?

Resource limiting of containers in pods is typically achieved using something like below -
resources
limits
cpu "600m"
requests
cpu "400m"
As you see, absolute values are used above.
Now,
If the server/host has, say, 1 core then the total CPU computing power of the server is 1,000m. And the container is limited to 600m of computing power, which makes sense.
But, if the server/host has say 16 cores then the total CPU computing power of server is 16,000m. But the container is still restricted to 600m of computing power, which might not make complete sense in every case.
What I instead want is to define limits/requests as a percentage of host resources. Something like below.
resources
limits
cpu "60%"
requests
cpu "40%"
Is this possible in k8s either out-of-box or using any CRD's?

Kubernetes HPA Auto Scaling Velocity

We have defined HPA for an application to have min 1 and max 4 replicas with 80% cpu as the threshold.
What we wanted was, if the pod cpu goes beyond 80%, the app needs to be scaled up 1 at a time.
Instead what is happening is the application is getting scaled up to max number of replicas.
How can we define the scale velocity to scale 1 pod at a time. And again if one of the pod consumes more than 80% cpu then scale one more pod up but not maximum replicas.
Let me know how do we achieve this.
First of all, the 80% CPU utilisation is not a threshold but a target value.
The HPA algorithm for calculating the desired number of replicas is based on the following formula:
X = N * (C/T)
Where:
X: desired number of replicas
N: current number of replicas
C: current value of the metric
T: target value for the metric
In other words, the algorithm aims at calculating a replica count that keeps the observed metric value as close as possible to the target value.
In your case, this means if the average CPU utilisation across the pods of your app is below 80%, the HPA tends to decrease the number of replicas (to make the CPU utilisation of the remaining pods go up). On the other hand, if the average CPU utilisation across the pods is above 80%, the HPA tends to increase the number of replicas, so that the CPU utilisation of the individual pods decreases.
The number of replicas that are added or removed in a single step depends on how far apart the current metric value is from the target value and on the current number of replicas. This decision is internal to the HPA algorithm and you can't directly influence it. The only contract that the HPA has with its users is to keep the metric value as close as possible to the target value.
If you need a very specific autoscaling behaviour, you can write a custom controller (or operator) to autoscale your application instead of using the HPA.
This - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details - expains the algorithm HPA uses, including the formula to calculate the number of "desired replicas".
If I recall, there were some (positive) changes to the HPA algo with v1.12.
HPA has total control on scale up as of today. You can only fine tune scale down operation with the following parameter.
--horizontal-pod-autoscaler-downscale-stabilization
The good news is that there is a proposal for Configurable scale up/down velocity for HPA

How to make Horizontal Pod Autoscaler scale down pod replicas on a percentage decrease threshold?

I am looking for a syntax/condition of percentage decrease threshold to be inserted in HPA.yaml file which would allow the Horizontal Pod Autoscaler to start decreasing the pod replicas when the CPU utilization falls that particular percentage threshold.
Consider this scenario:-
I mentioned an option targetCPUUtilizationPercentage and assigned it with value 50. minReplicas to be 1 and MaxReplicas to be 5.
Now lets assume the CPU utilization went above 50, and went till 100, making the HPA to create 2 replicas. If the utilization decreases to 51% also, HPA will not terminate 1 pod replica.
Is there any way to conditionize the scale down on the basis of % decrease in CPU utilization?
Just like targetCPUUtilizationPercentage, I could be able to mention targetCPUUtilizationPercentageDecrease and assign it value 30, so that when the CPU utilization falls from 100% to 70%, HPA terminates a pod replica and further 30% decrease in CPU utilization, so that when it reaches 40%, the other remaining pod replica gets terminated.
As per on-line resources, this topic is still under community progress "Configurable HorizontalPodAutoscaler options"
I didn't try but as workaround you can try to create custom metrics f.e. using Prometheus Adapter, Horizontal pod auto scaling by using custom metrics
in order to have more control about provided limits.
At the moment you can use horizontal-pod-autoscaler-downscale-stabilization:
--horizontal-pod-autoscaler-downscale-stabilization option to control
The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
On the other point of view this is expected due to the basis of HPA:
Applications that process very important data events. These should scale up as fast as possible (to reduce the data processing time), and scale down as soon as possible (to reduce cost).
Hope this help.

Pods and nodes Cpu usages in kubernates

Is Total cpu usages of a node sum up of all pods running in specific nodes.what is the relation between millicpu in cpu cores and % of cpu usages of node.Is request and limit control the cpu usages of pods if so then if a pods cpu usages reach its limit then it will be killed and move other node or continues execution in similar node with maximum limit.
Millicores is an absolute number (one core divided by 1000). A given node typically has multiple cores, so the relation between the number of millicores and the total percentage varies. For example 1000 millicores (one core) would be 25% on a four core node, but 50% on a node with two cores.
Request determines how much cpu a pod is guaranteed. It will not be scheduled on a node unless the node can deliver that much.
Limit determines how much a pod can get. It will not be killed or moved if it exceeds the limit; it is simply not allowed to exceed the limit.