HPA not scaling down - kubernetes

I hope you can shed some light on this.
I am facing the same issue as described here: Kubernetes deployment not scaling down even though usage is below threshold
My configuration is almost identical.
I have checked the hpa algorithm, but I cannot find an explanation for the fact that I am having only one replica of my-app3.
Any hints?
kubectl get hpa -A
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-ns1 my-app1 Deployment/my-app1 49%/75%, 2%/75% 1 10 2 20h
my-ns2 my-app2 Deployment/my-app2 50%/75%, 10%/75% 1 10 2 22h
my-ns2 my-app3 Deployment/my-app3 47%/75%, 10%/75% 1 10 1 22h
kubectl top po -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
my-ns1 pod-app1-8d694bc8f-mkbrh 1m 76Mi
my-ns1 pod-app1-8d694bc8f-qmlnw 1m 72Mi
my-ns2 pod-app2-59d895d96d-86fgm 1m 77Mi
my-ns2 pod-app2-59d895d96d-zr67g 1m 73Mi
my-ns2 pod-app3-6f8cbb68bf-vdhsd 1m 47Mi

Posting this answer as it could be beneficiary for the community members on why exactly Horizontal Pod Autoscaler decided not to scale the amount of replicas in this particular setup.
The formula for amount of replicas workload will have is:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
Following on the describe of HPA:
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-ns1 my-app1 Deployment/my-app1 49%/75%, 2%/75% 1 10 2 20h
my-ns2 my-app2 Deployment/my-app2 50%/75%, 10%/75% 1 10 2 22h
my-ns2 my-app3 Deployment/my-app3 47%/75%, 10%/75% 1 10 1 22h
HPA decides on the amount of replicas on the premise of their current amount.
A side note: In the setup that uses multiple metrics (for example CPU and RAM) it will use the higher metric and act accordingly.
Also please consider that downscaling has a cooldown.
Calculation on each of the Deployments
ceil[] - round a number up:
ceil(4,55) = 5
ceil(4,01) = 5
app1:
Replicas = ceil[2 * (49 / 75)]
Replicas = ceil[2 * 0,6533..]
Replicas = ceil[1,3066..]
Replicas = 2
This example shows that there will be no changes to be amount of replicas.
Amount of replicas would go:
Up when the currentMetricValue (49) would exceed the desiredMetricValue (75)
Down when the currentMetricValue (49) would be less than half of the desiredMetricValue (75)
app2 is in the same situation as app1 so it can be skipped
app3:
Replicas = ceil[1 * (49 / 75)]
Replicas = ceil[1 * 0,6266..]
Replicas = ceil[0,6266..]
Replicas = 1
This example also shows that there will be no changes to be amount of replicas.
Amount of replicas would go:
Up when the currentMetricValue (47) would exceed the desiredMetricValue (75)
Additional resources:
Kubernetes.io: Docs: Tasks: Run application: Horizontal pod autoscale

Indeed from my research it seems that the HPA algorithm works in this way:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
Do not know the reason why my-app3 was assigned one replica and the other two apps two replicas, but according to the algorithm it is not needed to scale out at this time.

Related

how to read the CPU utilization in k8s LENS

It may sound like a naive question, I am running some load testing on one of the deployments on k8s. So to get an idea of the CPU utilization, I opened LENS HPA and CPU utilization is being shown like this
can anyone please tell me how to understand this number, earlier it was 380/50% for CPU.
I just want to get an idea of what does this number means, if it is 380/50, is my CPU not big enough?
It means probably the same as the output from the kubectl describe hpa {hpa-name}:
$ kubectl describe hpa php-apache
Name: php-apache
...
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 60% (120m) / 50%
It means that CPU has consumption increased to to x % of the request - good example and explanation in the Kubernetes docs:
Within a minute or so, you should see the higher CPU load; for example:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
and then, more replicas. For example:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
Here, CPU consumption has increased to 305% of the request.
So in your example (380%/50%) it means that you setup HPA to maintain an average CPU utilization across pods to 50% (by increasing and decreasing number of replicas - updating the deployment) and CPU consumption has increased to 380% so the deployment will be resized automatically.
Also check:
Horizontal Pod Autoscaling
HorizontalPodAutoscaler Walkthrough

Calculating standardized cpu metrics requested by pods for an hour

I am trying to find the cpu requested by kubernetes pods rolled up to an hour.
lets say we have an average cluster cpu allocatable as 10 cpu cores
pod 1 requests 5 cpu for 20 mins.
pod 2 requests 5 cpu for 30 mins.
pod 3 requests 5 cpu for 1 hr.
now i have to find for an hour
how much cpu is occupied by pod 1 for that hour and likewise for pod 2 and 3.
since pod 3 is occupies 5 cpu thoroughout that hour - we can say pod 3 cpu occupied = 5
How to calculate the cpu requested by pod 1 and 2 normalized for that hour
my initial thought for normalizing was
For pod 1 => 5*(20/60) = 1.666667 (requests normalized for an hour)
But when i use it for the metrics
i see sum(pod1+pod2+....+podn) > total cluster cpu -- like way higher( 340 > 80)
Any thoughts on logic here?
Does Kubernetes always allocate the pod requests ?

k8s replicaSet status is confusing

I have two services:
bayonetta: backend clusterIp service, replicaset=2
hide: frontend nodePort service, replicaset=1
I ran kubectl get all. I see the line 3 and 4 of replicaSet section has everything as 0, why do we have those two lines when nothing is available?
replicaset.apps/bayonetta-deployment-5b75868d89 2 2 2 3h36m
replicaset.apps/bayonetta-deployment-5c65f74c8b 0 0 0 176m
replicaset.apps/hide-deployment-575b6bc68d 0 0 0 3h12m
replicaset.apps/hide-deployment-66d955986b 1 1 1 155m
You probably updated your Deployments, which results in scaling up new ReplicaSets and scaling down the existing ones. See the Kubernetes docs here, with the example:
Run kubectl get rs to see that the Deployment updated the Pods by creating a new ReplicaSet and scaling it up to 3 replicas, as well as scaling down the old ReplicaSet to 0 replicas.
kubectl get rs
The output is similar to this:
NAME DESIRED CURRENT READY AGE
nginx-deployment-1564180365 3 3 3 6s
nginx-deployment-2035384211 0 0 0 36s
K8S maintains multiple versions of ReplicationSets, this enables the rollback of a Deployment because of a bug or some other reason. More about it here (1). K8S maintains revisionHistoryLimit number of ReplicationSets which defaults to 10 (2).

How Kubernetes Horizontal Pod Autoscaler calculates CPU percentage?

I set up my cluster and I want my deployments to scale up when the first pod uses 75% of one cpu (core). I did this with hpa and everything is working but I noticed that the hpa percentage is strange.
Based on what I know 1 cpu = 1000 milliunits and what I see in kubectl top pods is pod-A using 9m but what I see in kubectl get hpa is pod-A 9%/75% which doesn't make sense, 9% of 1000 is 90 and not 9.
I want to know how hpa is calculating the percentage and how shall I configure it so when I reach 75% of one cpu it scales up?
To the horizontal pod autoscaler 100% of a metric (cpu or memory) is the amount set in resource requests. So if you pod requests 100m cpu, 9m is 9% and it would scale out on 75m.
Double check if you really have requested 1 (or 1000m) cpu by issuing kubectl describe pod <pod-name>.

Kuberenetes hpa patch command not working

I have Kuberenetes cluster hosted in Google Cloud.
I deployed my deployment and added an hpa rule for scaling.
kubectl autoscale deployment MY_DEP --max 10 --min 6 --cpu-percent 60
waiting a minute and run kubectl get hpa command to verify my scale rule - As expected, I have 6 pods running (according to min parameter).
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
MY_DEP Deployment/MY_DEP <unknown>/60% 6 10 6 1m
Now, I want to change the min parameter:
kubectl patch hpa MY_DEP -p '{"spec":{"minReplicas": 1}}'
Wait for 30 minutes and run the command:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
MY_DEP Deployment/MY_DEP <unknown>/60% 1 10 6 30m
expected replicas: 1, actual replicas: 6
More information:
You can assume that the system has no computing anything (0% CPU
utilization).
I waited for more than an hour. Nothing changed.
The same behavior is seen when i deleted the scaling rule and deployed it
again. The replicas parameter has not changed.
Question:
If I changed the MINPODS parameter to "1" - why I still have 6 pods? How to make Kubernetes to actually change the min pods in my deployment?
If I changed the MINPODS parameter to "1" - why I still have 6 pods?
I believe the answer is because of the <unknown>/60% present in the output. The fine manual states:
Please note that if some of the pod's containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric
and one can see an example of 0% / 50% in the walkthrough page. Thus, I would believe that since kubernetes cannot prove what percentage of CPU is being consumed -- neither above nor below the target -- it takes no action for fear of making whatever the situation is worse.
As for why there is a <unknown>, I would hazard a guess it's the dreaded heapster-to-metrics-server cutover that might be obfuscating that information from the kubernetes API. Regrettably, I don't have first-hand experience testing that theory, in order to offer you concrete steps beyond "see if your cluster is collecting metrics in a place that kubernetes can see them."