How to implement horizontal auto scaling in GKE autopilot based on a custom metric - kubernetes

I'm running a Kubernetes cluster on GKE autopilot
I have pods that do the following - Wait for a job, run the job (This can take minutes or hours), Then go to Pod Succeeded State which will cause Kubernetes to restart the pod.
The number of pods I need is variable depending on how many users are on the platform. Each user can request a job that needs a pod to run.
I don't want users to have to wait for pods to scale up so I want to keep a number of extra pods ready and waiting to execute.
The application my pods are running can be in 3 states - { waiting for job, running job, completed job}
Scaling up is fine as I can just use the scale API and always request to have a certain percentage of pods in waiting for job state
When scaling down I want to ensure that Kubernetes doesn't kill any pods that are in the running job state.
Should I implement a Custom Horizontal Pod Autoscaler?
Can I configure custom probes for my pod's application state?
I could use also use pod priority or a preStop hook

You can configure horizontal Pod autoscaling to ensure that Kubernetes doesn't kill any pods.
Steps for configuring horizontal pod scaling:
Create the Deployment, apply the nginx.yaml manifest,Run the following command:
kubectl apply -f nginx.yaml
Autoscaling based on resources utilization
1-Go to the Workloads page in Cloud Console.
2-Click the name of the nginx Deployment.
3-Click list Actions > Autoscale.
4-Specify the following values:
-Minimum number of replicas: 1
-Maximum number of replicas: 10
-Auto Scaling metric: CPU
-Target: 50
-Unit: %
5-Click Done.
6-Click Autoscale.
To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:
kubectl get hpa
Guide on how to Configure horizontal pod autoscaling.
You can also refer to this link of auto-scaling rules for the GKE autopilot cluster using a custom metric on the Cloud Console.


Azure Kubernetes Service - can the Cluster Autoscaler get triggered even if I don't set autoscaling explicitly?

I am deploying a service to Azure Kubernetes Service.
The Horizontal Pod Autoscaler scales the number of pods, whereas the Cluster Autoscaler scales the number of nodes based on the number of pending pods. If my understanding is correct, if I don't set up autoscaling in my deployment file, the HPA won't get triggered, and only one pod will run; therefore, the CA won't get triggered either.
My question is - is there a scenario in AKS where the CA would get triggered, even without setting autoscaling in my deployment file?
Cluster autoscaler is typically used together with the horizontal pod autoscaler. The Horizontal Pod Autoscaler increases or decreases the number of pods based on application demand, and the cluster autoscaler adjusts the number of nodes as needed to run those additional pods accordingly.
If your deployment does not have the capacity to automatically scale up or down via the HPA, NOR you don't manually increase number of pods to the level where no additional pods can run due to insufficient resource in your nodes then the CA would not be triggered therefore the answer is NO.
You might find this document from official azure docs helpful also.

Kubernetes : Cluster-Autoscaler: How to verify autoscaling is working

I am working on our EKS platform, where I have installed Cluster Autoscaler. I can see it running in Kube Dashboard. Yesterday for Load Testing, I triggered 20 replicas of a heavy app we have. The cpu usage per node climbed to 100%, but cluster auto-scaler didn't trigger any additional nodes. I was watching the logs and the logs kept on rotating in main loop, but no changes.
Here are the tags I have added to ASG, worker nodes : : true : owned
I can see the pod running in Dashboard :
Also, There are no scaling policies added in ASG. Are they required for Cluster Autoscaler? How to verify cluster autoscaler is working properly? What am I missing?
Actually cluster autoscaler checks for any unschedulable pods every 10 seconds and if any pods available in unschedulable state then it will check min and max of autoscaling group. You can check this wonderful FAQ how-does-scale-up-work of autoscaler. If it is not reached max then it will request to aws autoscaling group to add one more.
Now the answer of your question is, you can check or verify autoscaling easily by noticing whether you have any unscheduled pods in your cluster or not. If there is any then autoscaler will try to add one more node which will be reflected in autoscaler log if it is not reached in max limit.
For more details you can check this FAQ. You can check also vertical pods scaler to get vertical pods scaling from here
You can tail the logs and see the events.
kubectl logs -f deployment/cluster-autoscaler -n kube-system --tail=10
It will show the scaling events.

HPA could not get CPU metric during GKE node auto-scaling

Cluster information:
Kubernetes version: 1.12.8-gke.10
Cloud being used: GKE
Installation method: gcloud
Host OS: (machine type) n1-standard-1
CNI and version: default
CRI and version: default
During node scaling, HPA couldn't get CPU metric.
At the same time, kubectl top pod and kubectl top node output is:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get
For more details, I'll show you the flow of my problem occurs:
Suddenly many requests arrive at the GKE server. (Using testing tool)
HPA detects current CPU usage above target CPU usage(50%), thus try pod scale up
Insufficient CPU warning occurs when creating pods, thus GKE try node scalie up
Soon the HPA fails to get the metric, and kubectl top node or kubectl top pod
doesn’t get a response.
- At this time one or more OutOfcpu pods are found, and several pods are in
ContainerCreating (from Pending state).
After node scale-up is complete and some time has elapsed (about a few minutes),
HPA starts to fetch the CPU metric successfully and try to scale up/down based on
Same situation happens when node scale down.
This causes pod scaling to stop and raises some failures on responding to client’s requests. Is this normal?
I think HPA should get CPU metric(or other metrics) on running pods even during node scaling, to keep track of the optimal pod size at the moment. So when node scaling done, HPA create the necessary pods at once (rather than incrementally).
Can I make my cluster work like this?
Maybe your node runs out of one resource either memory or cpu, there are config maps that describe how addons are scaled depending on the cluster size. You need to edit metrics-server-config config map in kube-system namespace:
kubectl edit cm/metrics-server-config -n kube-system
you should add
to NannyConfiguration, here you can find extensive manual:
Also heapster suffers from the same OOM issue: too many pods to handle all metrics within assigned resources please modify heapster's config map in accordingly:
kubectl edit cm/heapster-config -n kube-system

How to setup auto scale Kubernetes cluster in hybrid mode

I need to create K8s autoscale setup for spark application which will be running - on premise and AWS both as docker images.By scale, I mean (scale up and down of nodes) from on-premise to AWS cloud using cluster autoscaler or by other means
I browsed so many articles like how to set up K8 cluster on AWS/ HPA & CA scaling but could not get concrete directions to follow
I am looking for any direction which can help me understand from where i should start/steps to follow to setup such K8s cluster.
Regarding Cluster Autoscaler:
Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:
- there are pods that failed to run in the cluster due to insufficient resources,
- there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
The cluster autoscaler on Azure dynamically scales Kubernetes worker nodes. It runs as a deployment in your cluster.
This README will help you get cluster autoscaler running on your Azure Kubernetes cluster.
Regarding HPA:
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization or other custom metrics. HPA normally fetches metrics from a series of aggregated APIs:
Metrics-server needs to be launched separately if you wish to base on something more than just CPU utilization. More info can be found here and here.
How to make it work?
HPA is being supported by kubectl by default:
kubectl create - creates a new autoscaler
kubectl get hpa - lists your autoscalers
kubectl describe hpa - gets a detailed description of autoscalers
kubectl delete - deletes an autoscaler
kubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80 creates an autoscaler for replication set foo, with target CPU utilization set to 80% and the number of replicas between 2 and 5.
Here is a detailed documentation of how to use kubectl autoscale command.

Autoscaling in Google Container Engine

I understand the Container Engine is currently on alpha and not yet complete.
From the docs I assume there is no auto-scaling of pods (e.g. depending on CPU load) yet, correct? I'd love to be able to configure a replication controller to automatically add pods (and VM instances) when the average CPU load reaches a defined threshold.
Is this somewhere on the near future roadmap?
Or is it possible to use the Compute Engine Autoscaler for this? (if so, how?)
As we work towards a Beta release, we're definitely looking at integrating the Google Compute Engine AutoScaler.
There are actually two different kinds of scaling:
Scaling up/down the number of worker nodes in the cluster depending on # of containers in the cluster
Scaling pods up and down.
Since Kubernetes is an OSS project as well, we'd also like to add a Kubernetes native autoscaler that can scale replication controllers. It's definitely something that's on the roadmap. I expect we will actually have multiple autoscaler implementations, since it can be very application specific...
Kubernetes autoscaling:
kubectl command:
kubectl autoscale deployment foo --min=2 --max=5 --cpu-percent=80
You can autoscale your deployment by using kubectl autoscale.
Autoscaling is actually when you want to modify the number of pods automatically as the requirement may arise.
kubectl autoscale deployment task2deploy1 –cpu-percent=50 –min=1 –max=10
kubectl get deployment task2deploy1
task2deploy1 1 1 1 1 49s
As the resource consumption increases the number of pods will increase and will be more than the number of pods you specified in your deployment.yaml file but always less than the maximum number of pods specified in the kubectl autoscale command.
kubectl get deployment task2deploy1
task2deploy1 7 7 7 3 4m
Similarly, as the resource consumption decreases, the number of pods will go down but never less than the number of minimum pods specified in the kubectl autoscale command.