I want to autoscale pods based on external metric, which is predicted CPU usage. I have an AI module that can predict what the CPU usage for pod will look like for the next 1 minute, based on last 5 minutes. I want HPA to autoscale based on these predictions and not on actual CPU usage. What is the best way to achieve this?
Related
We have pods running with same label in multiple namespaces. Cpu utilization of all pods should not cross licensed vCPU,say X-vCPUR
We do not want to limit through SUM(POD cpu Limit) < X-vCPU as we want to give flexibility to apps to burst, as license constraints is on sum of all pod utilization. Limiting through CPU Limit ,reduces number of pods that can be deployed and all pods will not burst at same time.
How can we achieve this? any insights help full, thanks in advance.
Note: There are other applications running with different labels, which does not consider for core usage.
Limiting through CPU limit
Nodes start up ,these apps burst at same time and takes more vCpu,we would like to limit combined pods utilization burst to below a specified limit.
I have a scaled deployment with predictable load change depends on time. How can I make my deployment prepared to the load (for example, I want to double pods number every evening from 16:00 to 23:00). Does Kubernetes provides such tool?
I know Kubernetes pods are scaling with Horizontal Pod Autoscaler, which scales the number of pods based on CPU utilisation or custom metric. But it is reactive approach, I'm looking for proactive.
A quick google search would direct you here: https://github.com/kubernetes/kubernetes/issues/49931
In essence the best solution as of now, is to either run a sidecar container for your pod's main container, which could use the kubernetes api to scale itself up based on a time period with a simple bash script, or write a CRD yourself that reacts to time based events (it is 6pm), something like this one:
https://github.com/amelbakry/kube-schedule-scaler
which watches annotations with cron-like specs on deployments and reacts accordingly.
If you are looking for a more advanced Auto scaler then you can give Keda Keda.sh a try. It has the support for cron based auto scale up & down.
Plus it also support some other event driven based auto scaling like what I have done based on Consumer group's lag in Apache Kafka for particular topic.
There are multiple event source supported, check it out here
Horizontal Pod Autoscaler of Kubernetes is not a re-active approach, but in fact it is a proactive scaling approach. Let I explain its algorithm using its default setting:
The cool time is 5 minutes
Resource utilization tracing for every 15 seconds
It means that the system traces resource utilization (depend on what metrics the end-users set, e.g., CPU, storage...etc.) for every 15 seconds.
Until every 5 minutes of cooling down (no scaling actions), the controller will calculate the resource utilization in the past 5 minutes (which uses the historical data traces in every 15 seconds above). Then it estimates number of resource (i.e., number of replicas) requires for the next 5-min time window by the equation:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue /
desiredMetricValue )]
Other pro-active auto-scaler also works in the similar manner. Different points is that they may apply different techniques (queue theory, machine learning, or time series model) to estimate desiredReplicas as what done in the above equation.
We have defined HPA for an application to have min 1 and max 4 replicas with 80% cpu as the threshold.
What we wanted was, if the pod cpu goes beyond 80%, the app needs to be scaled up 1 at a time.
Instead what is happening is the application is getting scaled up to max number of replicas.
How can we define the scale velocity to scale 1 pod at a time. And again if one of the pod consumes more than 80% cpu then scale one more pod up but not maximum replicas.
Let me know how do we achieve this.
First of all, the 80% CPU utilisation is not a threshold but a target value.
The HPA algorithm for calculating the desired number of replicas is based on the following formula:
X = N * (C/T)
Where:
X: desired number of replicas
N: current number of replicas
C: current value of the metric
T: target value for the metric
In other words, the algorithm aims at calculating a replica count that keeps the observed metric value as close as possible to the target value.
In your case, this means if the average CPU utilisation across the pods of your app is below 80%, the HPA tends to decrease the number of replicas (to make the CPU utilisation of the remaining pods go up). On the other hand, if the average CPU utilisation across the pods is above 80%, the HPA tends to increase the number of replicas, so that the CPU utilisation of the individual pods decreases.
The number of replicas that are added or removed in a single step depends on how far apart the current metric value is from the target value and on the current number of replicas. This decision is internal to the HPA algorithm and you can't directly influence it. The only contract that the HPA has with its users is to keep the metric value as close as possible to the target value.
If you need a very specific autoscaling behaviour, you can write a custom controller (or operator) to autoscale your application instead of using the HPA.
This - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details - expains the algorithm HPA uses, including the formula to calculate the number of "desired replicas".
If I recall, there were some (positive) changes to the HPA algo with v1.12.
HPA has total control on scale up as of today. You can only fine tune scale down operation with the following parameter.
--horizontal-pod-autoscaler-downscale-stabilization
The good news is that there is a proposal for Configurable scale up/down velocity for HPA
I am looking for a syntax/condition of percentage decrease threshold to be inserted in HPA.yaml file which would allow the Horizontal Pod Autoscaler to start decreasing the pod replicas when the CPU utilization falls that particular percentage threshold.
Consider this scenario:-
I mentioned an option targetCPUUtilizationPercentage and assigned it with value 50. minReplicas to be 1 and MaxReplicas to be 5.
Now lets assume the CPU utilization went above 50, and went till 100, making the HPA to create 2 replicas. If the utilization decreases to 51% also, HPA will not terminate 1 pod replica.
Is there any way to conditionize the scale down on the basis of % decrease in CPU utilization?
Just like targetCPUUtilizationPercentage, I could be able to mention targetCPUUtilizationPercentageDecrease and assign it value 30, so that when the CPU utilization falls from 100% to 70%, HPA terminates a pod replica and further 30% decrease in CPU utilization, so that when it reaches 40%, the other remaining pod replica gets terminated.
As per on-line resources, this topic is still under community progress "Configurable HorizontalPodAutoscaler options"
I didn't try but as workaround you can try to create custom metrics f.e. using Prometheus Adapter, Horizontal pod auto scaling by using custom metrics
in order to have more control about provided limits.
At the moment you can use horizontal-pod-autoscaler-downscale-stabilization:
--horizontal-pod-autoscaler-downscale-stabilization option to control
The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
On the other point of view this is expected due to the basis of HPA:
Applications that process very important data events. These should scale up as fast as possible (to reduce the data processing time), and scale down as soon as possible (to reduce cost).
Hope this help.
We have configured to use 2 metrics for HPA
CPU Utilization
App specific custom metrics
When testing, we observed the scaling happening, but calculation of no.of replicas is not very clear. I am not able to locate any documentation on this.
Questions:
Can someone point to documentation or code on the calculation part?
Is it a good practice to use multiple metrics for scaling?
Thanks in Advance!
From https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-the-horizontal-pod-autoscaler-work
If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen. If any of those metrics cannot be converted into a desired replica count (e.g. due to an error fetching the metrics from the metrics APIs), scaling is skipped.
Finally, just before HPA scales the target, the scale recommendation is recorded. The controller considers all recommendations within a configurable window choosing the highest recommendation from within that window. This value can be configured using the --horizontal-pod-autoscaler-downscale-stabilization-window flag, which defaults to 5 minutes. This means that scaledowns will occur gradually, smoothing out the impact of rapidly fluctuating metric values