Why Kubernetes HPA convert custom metric?

Why Kubernetes HPA convert custom metric? - kubernetes

Kubernetes Horisontal Pod Autoscaling (HPA) modifies my custom metric: StackDriver displays correct metric, but HPA shows another number.
For example, StackDrives value is 118K, but HPA displays 1656144.
I understand that HPA use some conversation for floating numbers, but my metric is integer: Unit: number Kind: Gauge Value type: Int64.
Running in GKE 1.11.7.
Any ideas?

if you specify targetValue it will be a whole number, so there won't be scaling down of pods.
If you use targetAverageValue it will calculate based on the number of pods created.

In your HPA manifest you did not specify value of --horizontal-pod-autoscaler-sync-period flag. As default it is set to 15 seconds.
In your case it means that HPA value is amout of whole deployment queue in last 15 seconds. More information can be found in HPA Documentation.
As you mentioned in StackDriver you used GAUGE metric which measures a value at a particular point in time - Stackdriver
In short, StackDriver shows current value in the exact time, HPA values is amount of last 15 seconds.

Related

GKE Autoscaling metrics in absolute value

I am trying to set an horizontal pod autoscaling metrics on my GKE deployment based on absolute value but still didn't get the difference between the absolute value and the percentage :
let's say i'm requesting 500mCPU per Pod for a starting number of 3 pods.
If i want to replace the autoscaling metrics of "50% of CPU Usage" by an absolute value : will it be "250 mCPU" ?
is it based on an average user per pod or is it the total use for all the pods ?
thank you in advance

If you use targetAverageValue ( or even targetAverageUtilization ), the metric value used by the scaling algorithm is based on the average across all matching pods.
From the Horizontal Pod Autoscaling docs:
When a targetAverageValue or targetAverageUtilization is specified,
the currentMetricValue is computed by taking the average of the given
metric across all Pods in the HorizontalPodAutoscaler's scale target.

Kubernetes HPA (with custom metrics) scaling policies

Starting from Kubernetes v1.18 the v2beta2 API allows scaling behavior to be configured through the Horizontal Pod Autoscalar (HPA) behavior field. I'm planning to apply HPA with custom metrics to a StatefulSet.
The use case I'm looking at is scaling out using a custom metric (e.g. number of user sessions on my application), but the HPA will not scale down at all. This use case is also described by K8s SIG-Autoscaling enhancements - "Configurable scale velocity for HPA >> Story 4: Scale Up As Usual, Do Not Scale Down".
behavior:
scaleDown:
policies:
- type: pods
value: 0
The user sessions could stay active for minutes to hours. Starting with 1 replica of the StatefulSet, as the number of user sessions hit an upper limit (exposed using Prometheus collector and later configured using HPA custom metric option), the application pods will scale-out. The new pods will start serving new users.
Since this is a StatefulSet and cannot just abruptly scale down, I'm seeking help on ways to scale down when the user sessions on the new replicas go down to 0. The above link says that the scale down can be controlled by a separate process. Not sure how to do this? Looking for some pointers.
Thanks.

You can use periodSeconds and stabilizationWindowSeconds values to manage how much time will pass between termination of pods, for example:
behavior:
scaleDown:
stabilizationWindowSeconds: 10
policies:
- type: Pods
value: 1
periodSeconds: 20
This way it will scale down 1 pod every ~30 seconds (or whatever value will be used in periodSeconds and stabilizationWindowSeconds). Time may vary depending on stabilizationWindowSeconds values over time.
periodSeconds describes how much time will pass between termination of each pod, maximum value is 1800 second (30 minutes).
stabilizationWindowSeconds when metrics indicate that target should be scaled down, this algorithm takes a look into previously calculated desired states and uses highest value from specified interval. For scale down default value is 300, maximum value is 3600 (one hour).

Kubernetes HPA Auto Scaling Velocity

We have defined HPA for an application to have min 1 and max 4 replicas with 80% cpu as the threshold.
What we wanted was, if the pod cpu goes beyond 80%, the app needs to be scaled up 1 at a time.
Instead what is happening is the application is getting scaled up to max number of replicas.
How can we define the scale velocity to scale 1 pod at a time. And again if one of the pod consumes more than 80% cpu then scale one more pod up but not maximum replicas.
Let me know how do we achieve this.

First of all, the 80% CPU utilisation is not a threshold but a target value.
The HPA algorithm for calculating the desired number of replicas is based on the following formula:
X = N * (C/T)
Where:
X: desired number of replicas
N: current number of replicas
C: current value of the metric
T: target value for the metric
In other words, the algorithm aims at calculating a replica count that keeps the observed metric value as close as possible to the target value.
In your case, this means if the average CPU utilisation across the pods of your app is below 80%, the HPA tends to decrease the number of replicas (to make the CPU utilisation of the remaining pods go up). On the other hand, if the average CPU utilisation across the pods is above 80%, the HPA tends to increase the number of replicas, so that the CPU utilisation of the individual pods decreases.
The number of replicas that are added or removed in a single step depends on how far apart the current metric value is from the target value and on the current number of replicas. This decision is internal to the HPA algorithm and you can't directly influence it. The only contract that the HPA has with its users is to keep the metric value as close as possible to the target value.
If you need a very specific autoscaling behaviour, you can write a custom controller (or operator) to autoscale your application instead of using the HPA.

This - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details - expains the algorithm HPA uses, including the formula to calculate the number of "desired replicas".
If I recall, there were some (positive) changes to the HPA algo with v1.12.

HPA has total control on scale up as of today. You can only fine tune scale down operation with the following parameter.
--horizontal-pod-autoscaler-downscale-stabilization
The good news is that there is a proposal for Configurable scale up/down velocity for HPA

How to make Horizontal Pod Autoscaler scale down pod replicas on a percentage decrease threshold?

I am looking for a syntax/condition of percentage decrease threshold to be inserted in HPA.yaml file which would allow the Horizontal Pod Autoscaler to start decreasing the pod replicas when the CPU utilization falls that particular percentage threshold.
Consider this scenario:-
I mentioned an option targetCPUUtilizationPercentage and assigned it with value 50. minReplicas to be 1 and MaxReplicas to be 5.
Now lets assume the CPU utilization went above 50, and went till 100, making the HPA to create 2 replicas. If the utilization decreases to 51% also, HPA will not terminate 1 pod replica.
Is there any way to conditionize the scale down on the basis of % decrease in CPU utilization?
Just like targetCPUUtilizationPercentage, I could be able to mention targetCPUUtilizationPercentageDecrease and assign it value 30, so that when the CPU utilization falls from 100% to 70%, HPA terminates a pod replica and further 30% decrease in CPU utilization, so that when it reaches 40%, the other remaining pod replica gets terminated.

As per on-line resources, this topic is still under community progress "Configurable HorizontalPodAutoscaler options"
I didn't try but as workaround you can try to create custom metrics f.e. using Prometheus Adapter, Horizontal pod auto scaling by using custom metrics
in order to have more control about provided limits.
At the moment you can use horizontal-pod-autoscaler-downscale-stabilization:
--horizontal-pod-autoscaler-downscale-stabilization option to control
The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
On the other point of view this is expected due to the basis of HPA:
Applications that process very important data events. These should scale up as fast as possible (to reduce the data processing time), and scale down as soon as possible (to reduce cost).
Hope this help.

How kubernetes HPA with 2 or more metrics behaves - especially the no.of replicas calculation?

We have configured to use 2 metrics for HPA
CPU Utilization
App specific custom metrics
When testing, we observed the scaling happening, but calculation of no.of replicas is not very clear. I am not able to locate any documentation on this.
Questions:
Can someone point to documentation or code on the calculation part?
Is it a good practice to use multiple metrics for scaling?
Thanks in Advance!

From https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-the-horizontal-pod-autoscaler-work
If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen. If any of those metrics cannot be converted into a desired replica count (e.g. due to an error fetching the metrics from the metrics APIs), scaling is skipped.
Finally, just before HPA scales the target, the scale recommendation is recorded. The controller considers all recommendations within a configurable window choosing the highest recommendation from within that window. This value can be configured using the --horizontal-pod-autoscaler-downscale-stabilization-window flag, which defaults to 5 minutes. This means that scaledowns will occur gradually, smoothing out the impact of rapidly fluctuating metric values