Scale up condition keeps idle pods up - kubernetes

Having a HPA configuration of 50% average CPU
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
I found the problem that I have only one pod receiving traffic so the CPU is higher than 50% of request cpu.
Then start auto scaling up new pods, but those sometimes are not receiving yet any traffic, so the cpu consumption is very low.
My expectations was to see those pods that dont use any cpu to be scale down at some point(how much it should take?), but it's not happening, and I believe the reason is, that first condition of one pod cpu use, higher than 50% is forcing to keep those pods up.
What I need is to scale up/down those pods, until they can start receiving traffic, which it depends on in which node they are deployed.
Any suggestion of how to accomplish this issue?

Related

how is kubernetes cluster-autoscaler determining cpu utilization of nodes in EKS

I have a EKS cluster running with cluster-autoscaler version 1.21.2 deployed. When I did a kubectl top nodes, I found a node using 5% cpu and 21% memory utilised. But in cluster-autoscaler pod log, I see below message for the same node:
Node XXXX is not suitable for removal - cpu utilization too big (0.663130)
I'm now confused how is cluster autoscaler calculating this value and why is the node not scaled down. BTW, I used default config of --scale-down-utilization-threshold=0.5
We stumbled upon the same issue, and realized that the CPU utilization value (in your case 66,31%) matches roughly the amount of CPU requested by the pods/containers running on the node.
Remember: Requested CPU (and other resources) by a pod/container is given guaranteed.
This is why it sounds logical to us that when looking at the node's actual CPU usage, it might be idle, though from a Kubernetes autoscaling perspective, the node uses 66% from the CPU.

why k8s deployment hpa doesn't calculate istio/sidecar 's cpu request

i set hpa for my deployment/app, for example, CPU 80%.
my app deployment has two containers, one is app for traffic, the other is automatically injected istio-proxy.
when i get hpa during running traffic, i found something unexpected for the hpa result.
the cpu request of istio-proxy is 2G.
the cpu request of app is 4G.
the cpu consumed of istio-proxy is 1G.
the cpu consumed of app is 4G.
so, i expected the hpa of this pod (including 2 containers) is (1+2)/(2+4) = 50%.
but the actual result is close to (1+2)/4 = 75%.
it seems the istio-proxy's cpu request is excluded from calculating cpu utilization of hpa.
as i know, k8s get cpu requests from deployment, but actually for this sidecar auto injection case, the deployment yaml doesn't have any istio-proxy container information.
i guess that's why the istio-proxy cpu request is excluded.
but is that the expected behavior or a bug ?
I think as of 1.19, the hpa works on an average value of all containers in the pods. The exact logic is here : https://github.com/kubernetes/kubernetes/blob/v1.9.0/pkg/controller/podautoscaler/metrics/utilization.go#L49
currentUtilization = int32((metricsTotal * 100) / requestsTotal)
As per the above logic HPA is calculating pod cpu utilization as total cpu usage of all containers in pod divided by total request

GKE node pool with Autoscaling does not scale down

I have a GKE cluster with two nodepools. I turned on autoscaling on one of my nodepools but it does not seem to automatically scale down.
I have enabled HPA and that works fine. It scales the pods down to 1 when I don't see traffic.
The API is currently not getting any traffic so I would expect the nodes to scale down as well.
But it still runs the maximum 5 nodes despite some nodes using less than 50% of allocatable memory/CPU.
What did I miss here? I am planning to move these pods to bigger machines but to do that I need the node autoscaling to work to control the monthly cost.
There are many reasons that can cause CA to not be downscaling successfully. If we resume how this should work normally it will be something like this:
Cluster autoscaler will periodically check (every 10 seconds) utilization of the nodes.
If the utilization factor is less than 0.5 the node will be considered as under utilization.
Then the nodes will be marked for removal and will be monitored for next 10 mins to make sure the utilization factor stays less than 0.5.
If even after 10 mins it stays under utilized then the node would be removed by cluster autoscaler.
If above is not being accomplished, then something else is preventing your nodes to be downscaling. In my experience PDBs needs to be applied to kube-system pods and I would say that could be the reason why; however, there are many reasons why this can be happening, here are reasons that can cause downscaling issues:
1. PDB is not applied to your kube-system pods. Kube-system pods prevent Cluster Autoscaler from removing nodes on which they are running. You can manually add Pod Disruption Budget(PDBs) for the kube-system pods that can be safely rescheduled elsewhere, this can be added with next command:
`kubectl create poddisruptionbudget PDB-NAME --namespace=kube-system --selector app=APP-NAME --max-unavailable 1`
2. Containers using local storage (volumes), even empty volumes. Kubernetes prevents scale down events on nodes with pods using local storage. Look for this kind of configuration that prevents Cluster Autoscaler to scale down nodes.
3. Pods annotated with cluster-autoscaler.kubernetes.io/safe-to-evict: true. Look for pods with this annotation that can be preventing Nodes scaledown
4. Nodes annotated with cluster-autoscaler.kubernetes.io/scale-down-disabled: true. Look for Nodes with this annotation that can be preventing cluster Autoscale. These configurations are the ones I will suggest you check on, in order to make your cluster to be scaling down nodes that are under utilized. -----
Also you can see this page where explains the configuration to prevent the downscales, which can be what is happening to you.

Kubernetes - Set Pod replication criteria based on memory and cpu usage

I am newbie to Kubernetes world. Please excuse if I am getting anything wrong.
I understand that pod replication is handled by k8s itself. We can also set cpu and memory usage for pods. But is it possible to change replication criteria based on memory and cpu usage? For example if I want to a pod to replicate when its memory/cpu usage reaches 70%.
Can we do it using metrics collected by Prometheus etc ?
You can use horizontal pod autoscaler. From the docs
The Horizontal Pod Autoscaler automatically scales the number of Pods
in a replication controller, deployment, replica set or stateful set
based on observed CPU utilization (or, with custom metrics support, on
some other application-provided metrics). Note that Horizontal Pod
Autoscaling does not apply to objects that can't be scaled, for
example, DaemonSets.
The Horizontal Pod Autoscaler is implemented as a Kubernetes API
resource and a controller. The resource determines the behavior of the
controller. The controller periodically adjusts the number of replicas
in a replication controller or deployment to match the observed
average CPU utilization to the target specified by user
An example from the doc
The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods. HPA will increase and decrease the number of replicas to maintain an average CPU utilization across all Pods of 50%.
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

Kubernetes nodes with low CPU utilisation

I have a node pool for one deployment with 200-1000 pods. They're set with a CPU based HPA.
When the HPA scales down the deployment, it removes pods randomly, and eventually, I have an under-utilized node pool. The nodes aren't scaled down correctly because they still have at least one pod running.
I tried to find a solution and failed. Possible solutions, in my opinion:
HPA is aware of nodes utilization.
A PodDisruptionBudget for nodes?
Drain node if its CPU utilization is under a threshold.
Any help will be much appreciated.