GKE Autoscaling metrics in absolute value - kubernetes

I am trying to set an horizontal pod autoscaling metrics on my GKE deployment based on absolute value but still didn't get the difference between the absolute value and the percentage :
let's say i'm requesting 500mCPU per Pod for a starting number of 3 pods.
If i want to replace the autoscaling metrics of "50% of CPU Usage" by an absolute value : will it be "250 mCPU" ?
is it based on an average user per pod or is it the total use for all the pods ?
thank you in advance

If you use targetAverageValue ( or even targetAverageUtilization ), the metric value used by the scaling algorithm is based on the average across all matching pods.
From the Horizontal Pod Autoscaling docs:
When a targetAverageValue or targetAverageUtilization is specified,
the currentMetricValue is computed by taking the average of the given
metric across all Pods in the HorizontalPodAutoscaler's scale target.

Related

How to implment Horizontal Pod Autoscaling on external metrics?

I want to autoscale pods based on external metric, which is predicted CPU usage. I have an AI module that can predict what the CPU usage for pod will look like for the next 1 minute, based on last 5 minutes. I want HPA to autoscale based on these predictions and not on actual CPU usage. What is the best way to achieve this?

Kubernetes HPA (with custom metrics) scaling policies

Starting from Kubernetes v1.18 the v2beta2 API allows scaling behavior to be configured through the Horizontal Pod Autoscalar (HPA) behavior field. I'm planning to apply HPA with custom metrics to a StatefulSet.
The use case I'm looking at is scaling out using a custom metric (e.g. number of user sessions on my application), but the HPA will not scale down at all. This use case is also described by K8s SIG-Autoscaling enhancements - "Configurable scale velocity for HPA >> Story 4: Scale Up As Usual, Do Not Scale Down".
behavior:
scaleDown:
policies:
- type: pods
value: 0
The user sessions could stay active for minutes to hours. Starting with 1 replica of the StatefulSet, as the number of user sessions hit an upper limit (exposed using Prometheus collector and later configured using HPA custom metric option), the application pods will scale-out. The new pods will start serving new users.
Since this is a StatefulSet and cannot just abruptly scale down, I'm seeking help on ways to scale down when the user sessions on the new replicas go down to 0. The above link says that the scale down can be controlled by a separate process. Not sure how to do this? Looking for some pointers.
Thanks.
You can use periodSeconds and stabilizationWindowSeconds values to manage how much time will pass between termination of pods, for example:
behavior:
scaleDown:
stabilizationWindowSeconds: 10
policies:
- type: Pods
value: 1
periodSeconds: 20
This way it will scale down 1 pod every ~30 seconds (or whatever value will be used in periodSeconds and stabilizationWindowSeconds). Time may vary depending on stabilizationWindowSeconds values over time.
periodSeconds describes how much time will pass between termination of each pod, maximum value is 1800 second (30 minutes).
stabilizationWindowSeconds when metrics indicate that target should be scaled down, this algorithm takes a look into previously calculated desired states and uses highest value from specified interval. For scale down default value is 300, maximum value is 3600 (one hour).

How to configure Kubernetes cluster autoscaler to scale down only?

I'd like to run the kubernetes cluster autoscaler so that unneeded nodes will be removed automatically, but I don't want the autoscaler to add nodes automatically. I prefer to handle scaling up myself. Is this possible?
I found maxNodesTotal, but I worry the semantics of setting this to 0 might mean all my nodes will go away. I also found scaleDownEnabled, but no corresponding option for scaling up.
Kubernetes Cluster Autoscaler or CA will attempt scale up whenever it will identify pending pods waiting to be scheduled to run but request more resources(CPU/RAM) than any available node can serve.
You can use the parameter maxNodeTotal to limit the maximum number of nodes CA would be allowed to spin up.
For example if you don't want your cluster to consist of any more than 3 nodes during peak utlization than you would set maxNodeTotal to 3.
There are different considerations that you should be aware of in terms of cost savings, performance and availability.
I would try to list some related to cost savings and efficient utilization as I suspect you might be more interested in that aspect.
Make sure you size your pods in consistency to their actual utlization, because scale up would get triggered by Pods resource request and not actual Pod resource utilization.
Also, bigger Pods are less likely to fit together on the same node, and in addition CA won't be able to scale down any semi-utilised nodes, resulting in resource spending.
Since you tagged this question with EKS, I will assume you are on AWS. On AWS the ASG (Auto Scaling Group) for each NodeGroup has a Max setting that is honoured by the cluster autoscaler. You can set this to prevent scaling above the set number of nodes. If the Min and Max on the ASG are the same value, then the autoscaler will never scale up or down. If the Min and Max are different, then the autoscaler can scale both up and down between those number of nodes. This is not exactly "never scale up", but it limits the upper end.
If you have multiple NodeGroups (ASGs), then each one can have different Min and Max nodes values.
You can also configure the cluster autoscaler itself in different ways. For example, you can set the utilization threshold. If a node's utilization fall under this threshold then the cluster autoscaler considers the node for scale down. See the FAQ.
The FAQ entry above that one may also apply. You can add an annotation to any node you do not want considered for scale down by the cluster autoscaler. Set: kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-disabled=true or annotate the nodes as they are created. You can do this with entries in your AWS node group setup.

Kubernetes HPA Auto Scaling Velocity

We have defined HPA for an application to have min 1 and max 4 replicas with 80% cpu as the threshold.
What we wanted was, if the pod cpu goes beyond 80%, the app needs to be scaled up 1 at a time.
Instead what is happening is the application is getting scaled up to max number of replicas.
How can we define the scale velocity to scale 1 pod at a time. And again if one of the pod consumes more than 80% cpu then scale one more pod up but not maximum replicas.
Let me know how do we achieve this.
First of all, the 80% CPU utilisation is not a threshold but a target value.
The HPA algorithm for calculating the desired number of replicas is based on the following formula:
X = N * (C/T)
Where:
X: desired number of replicas
N: current number of replicas
C: current value of the metric
T: target value for the metric
In other words, the algorithm aims at calculating a replica count that keeps the observed metric value as close as possible to the target value.
In your case, this means if the average CPU utilisation across the pods of your app is below 80%, the HPA tends to decrease the number of replicas (to make the CPU utilisation of the remaining pods go up). On the other hand, if the average CPU utilisation across the pods is above 80%, the HPA tends to increase the number of replicas, so that the CPU utilisation of the individual pods decreases.
The number of replicas that are added or removed in a single step depends on how far apart the current metric value is from the target value and on the current number of replicas. This decision is internal to the HPA algorithm and you can't directly influence it. The only contract that the HPA has with its users is to keep the metric value as close as possible to the target value.
If you need a very specific autoscaling behaviour, you can write a custom controller (or operator) to autoscale your application instead of using the HPA.
This - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details - expains the algorithm HPA uses, including the formula to calculate the number of "desired replicas".
If I recall, there were some (positive) changes to the HPA algo with v1.12.
HPA has total control on scale up as of today. You can only fine tune scale down operation with the following parameter.
--horizontal-pod-autoscaler-downscale-stabilization
The good news is that there is a proposal for Configurable scale up/down velocity for HPA

Why Kubernetes HPA convert custom metric?

Kubernetes Horisontal Pod Autoscaling (HPA) modifies my custom metric: StackDriver displays correct metric, but HPA shows another number.
For example, StackDrives value is 118K, but HPA displays 1656144.
I understand that HPA use some conversation for floating numbers, but my metric is integer: Unit: number Kind: Gauge Value type: Int64.
Running in GKE 1.11.7.
Any ideas?
if you specify targetValue it will be a whole number, so there won't be scaling down of pods.
If you use targetAverageValue it will calculate based on the number of pods created.
In your HPA manifest you did not specify value of --horizontal-pod-autoscaler-sync-period flag. As default it is set to 15 seconds.
In your case it means that HPA value is amout of whole deployment queue in last 15 seconds. More information can be found in HPA Documentation.
As you mentioned in StackDriver you used GAUGE metric which measures a value at a particular point in time - Stackdriver
In short, StackDriver shows current value in the exact time, HPA values is amount of last 15 seconds.