how to ensure Scaled down pod replica is success - kubernetes

I want to update k8s deployment image from 22.41.70 to 22.41.73,as follow:
NewReplicaSet: hiroir-deployment-5b9f574565 (3/3 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 13m deployment-controller Scaled up replica set hiroir-deployment-7ff8845548 to 3
Normal ScalingReplicaSet 8m56s deployment-controller Scaled up replica set hiroir-deployment-5b9f574565 to 1
Normal ScalingReplicaSet 8m56s deployment-controller Scaled down replica set hiroir-deployment-7ff8845548 to 2
Normal ScalingReplicaSet 8m56s deployment-controller Scaled up replica set hiroir-deployment-5b9f574565 to 2
Normal ScalingReplicaSet 8m52s deployment-controller Scaled down replica set hiroir-deployment-7ff8845548 to 1
Normal ScalingReplicaSet 8m52s deployment-controller Scaled up replica set hiroir-deployment-5b9f574565 to 3
Normal ScalingReplicaSet 8m52s deployment-controller Scaled down replica set hiroir-deployment-7ff8845548 to 0
I want to know how to ensure Scaled down pod replica is success?

You can check using kubectl get deployment <name, eg. hiroir> --namespace <namespace if not default> -o wide. Look at the "AVAILABLE" column and check the count if it aligns to last scaled replicas count, "IMAGES" column for the image that you have updated.

Related

Azure kubernetes pods showing high cpu usage when they get restarted or hpa works?

We are using AKS version 1.19.11.
It is noticed that whenever a new rollout is in placed for our deployments or a new pod got created as part of the hpa settings or pod got restarted, We are getting high cpu usage alerts.
For example, -if a new pod got created as part of any of the above activities, will this take up more CPU than the allowed Threshold ? [ the “Maximum limit” of 1 core specified in the deployment spec and the apps are light weight and doesnt need thatmuch cpu anuyways ] ? its in turn makes sudden spike in the AzureMonitor for a short time and then it became normal.
Why the pods are taking more cpu during its startup or creation time?
if the pods are not using thatmuch cpu, what will be the reason for this repeating issues?
hpa settings as below
Name: myapp
Namespace: myapp
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: myapp
meta.helm.sh/release-namespace: myapp
CreationTimestamp: Mon, 26 Apr 2021 07:02:32 +0000
Reference: Deployment/myapp
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 5% (17m) / 75%
Min replicas: 5
Max replicas: 12
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
ading the events when a new rollout placed.
as per the events captured from the “myapp” Namespace , there were new deployment rolled out for myapp as below.
During the new pods creation its showing more CPU spikes as we are getting alert from the Azuremonitor that its exceeds the threshold of 80%.[the “Maximum limit” of 1 core specified in the deployment spec]
30m Normal SuccessfulDelete replicaset/myapp-1a2b3c4d5e Deleted pod: myapp-1a2b3c4d5e-9fmrk
30m Normal SuccessfulDelete replicaset/myapp-1a2b3c4d5e Deleted pod: myapp-1a2b3c4d5e-hfr8w
29m Normal SuccessfulDelete replicaset/myapp-1a2b3c4d5e Deleted pod: myapp-1a2b3c4d5e-l2pnd
31m Normal ScalingReplicaSet deployment/myapp Scaled up replica set myapp-5ddc98fb69 to 1
30m Normal ScalingReplicaSet deployment/myapp Scaled down replica set myapp-1a2b3c4d5e to 2
30m Normal ScalingReplicaSet deployment/myapp Scaled up replica set myapp-5ddc98fb69 to 2
30m Normal ScalingReplicaSet deployment/myapp Scaled down replica set myapp-1a2b3c4d5e to 1
30m Normal ScalingReplicaSet deployment/myapp Scaled up replica set myapp-5ddc98fb69 to 3
29m Normal ScalingReplicaSet deployment/myapp Scaled down replica set myapp-1a2b3c4d5e to 0
Alert settings
Period Over the last 15 mins
Value 100.274747
Operator GreaterThan
Threshold 80
i am not sure what metrics you are looking at in AKS monitoring specifically as you have not mentioned it but it could be possible,
when you are deploying the POD or HPA scaling the replicas your AKS showing the total resource of all replicas.
During the deployment, it's possible at a certain stage all PODs are in the running phase and taking & consuming the resources.
Are you checking specific resources of one single POD and it's going
above the threshold ?
As you have mentioned application is lightweight however it is possible initially it taking resources to start the process, in that case, you might have to check resources using profiling.

Does apply works according to the rolling update policy?

I know about several ways to perform rolling update of deployment. But do either kubectl apply -f deployment.yaml or kubectl apply -k ... update deployment according to the rolling update policy of a new version of deployment or an old one?
Yes it will, with one note :
Note: A Deployment's rollout is triggered if and only if the
Deployment's Pod template (that is, .spec.template) is changed, for
example if the labels or container images of the template are updated.
Other updates, such as scaling the Deployment, do not trigger a
rollout.
Reference : https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
For example, you can see the events section of a deployment update after updating the nginx image and running kubectl apply -f nginx-deploy.yml :
...
NewReplicaSet: nginx-deployment-559d658b74 (3/3 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 112s deployment-controller Scaled up replica set nginx-deployment-66b6c48dd5 to 3
Normal ScalingReplicaSet 44s deployment-controller Scaled up replica set nginx-deployment-559d658b74 to 1
Normal ScalingReplicaSet 20s deployment-controller Scaled down replica set nginx-deployment-66b6c48dd5 to 2
Normal ScalingReplicaSet 20s deployment-controller Scaled up replica set nginx-deployment-559d658b74 to 2
Normal ScalingReplicaSet 19s deployment-controller Scaled down replica set nginx-deployment-66b6c48dd5 to 1
Normal ScalingReplicaSet 19s deployment-controller Scaled up replica set nginx-deployment-559d658b74 to 3
Normal ScalingReplicaSet 18s deployment-controller Scaled down replica set nginx-deployment-66b6c48dd5 to 0
$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3 3 3 114s

Is there a way in Kubernetes to check when hpa happened?

I have hpa configured for one of my deployment in Kubernetes.
Is there any way to check if HPA scaling happened to the deployment and when it happened?
I don't have prometheus or any monitoring solutions deployed.
If you created HPA you can check current status using command
$ kubectl get hpa
You can also use "watch" flag to refresh view each 30 seconds
$ kubectl get hpa -w
To check if HPA worked you have to describe it
$ kubectl describe hpa <yourHpaName>
Information will be in Events: section.
Also your deployment will contain some information about scaling
$ kubectl describe deploy <yourDeploymentName>
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set php-apache-b5f58cc5f to 1
Normal ScalingReplicaSet 9m45s deployment-controller Scaled up replica set php-apache-b5f58cc5f to 4
Normal ScalingReplicaSet 9m30s deployment-controller Scaled up replica set php-apache-b5f58cc5f to 8
Normal ScalingReplicaSet 9m15s deployment-controller Scaled up replica set php-apache-b5f58cc5f to 10
Another way is use events
$ kubectl get events | grep HorizontalPodAutoscaler
5m20s Normal SuccessfulRescale HorizontalPodAutoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
5m5s Normal SuccessfulRescale HorizontalPodAutoscaler New size: 8; reason: cpu resource utilization (percentage of request) above target
4m50s Normal SuccessfulRescale HorizontalPodAutoscaler New size: 10; reason:

How to diagnose a stuck Kubernetes rollout / deployment?

It seems a deployment has gotten stuck. How can I diagnose this further?
kubectl rollout status deployment/wordpress
Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
It's stuck on that for ages already. It is not terminating the two older pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-r6g6w 1/1 Running 0 2h
redis-679c597dd-67rgw 1/1 Running 0 2h
wordpress-64c944d9bd-dvnwh 4/4 Running 3 3h
wordpress-64c944d9bd-vmrdd 4/4 Running 3 3h
wordpress-f59c459fd-qkfrt 0/4 Pending 0 22m
wordpress-f59c459fd-w8c65 0/4 Pending 0 22m
And the events:
kubectl get events --all-namespaces
NAMESPACE LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
default 25m 2h 333 wordpress-686ccd47b4-4pbfk.153408cdba627f50 Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 25m 2h 337 wordpress-686ccd47b4-vv9dk.153408cc8661c49d Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 22m 22m 1 wordpress-686ccd47b4.15340e5036ef7d1c ReplicaSet Normal SuccessfulDelete replicaset-controller Deleted pod: wordpress-686ccd47b4-4pbfk
default 22m 22m 1 wordpress-686ccd47b4.15340e5036f2fec1 ReplicaSet Normal SuccessfulDelete replicaset-controller Deleted pod: wordpress-686ccd47b4-vv9dk
default 2m 22m 72 wordpress-f59c459fd-qkfrt.15340e503bd4988c Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 2m 22m 72 wordpress-f59c459fd-w8c65.15340e50399a8a5a Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 22m 22m 1 wordpress-f59c459fd.15340e5039d6c622 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: wordpress-f59c459fd-w8c65
default 22m 22m 1 wordpress-f59c459fd.15340e503bf844db ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: wordpress-f59c459fd-qkfrt
default 3m 23h 177 wordpress.1533c22c7bf657bd Ingress Normal Service loadbalancer-controller no user specified default backend, using system default
default 22m 22m 1 wordpress.15340e50356eaa6a Deployment Normal ScalingReplicaSet deployment-controller Scaled down replica set wordpress-686ccd47b4 to 0
default 22m 22m 1 wordpress.15340e5037c04da6 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set wordpress-f59c459fd to 2
You can use describe kubectl describe po wordpress-f59c459fd-qkfrt but from the message the pods cannot be scheduled in any of the nodes.
Provide more capacity, like try to add a node, to allow the pods to be scheduled.
The new deployment had a replica count of 3 while the previous had 2. I assumed I could set a high value for replica count and it would try to deploy as many replicas as it could before it reaches it's resource capacity. However this does not seem to be the case...

deployment fails to recreate a successful replicaset

We are using Kubernetes 1.8 to deploy our software in a cloud provider. Frequently, when deploying a specific pod-template, the deployment fails to create a successful replicaset and no instance is created. I am not able to find a better description than kubectl describe deploy.
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 21m (x3 over 2d) deployment-controller Scaled up replica set cbase-d-6bbfbdb5dc to 1
Normal ScalingReplicaSet 19m (x3 over 2d) deployment-controller Scaled down replica set cbase-d-6bbfbdb5dc to 0
also you can check the status of the replicaet:
kubectl describe replicaset cbase-d-6bbfbdb5dc
hope you will find the conditions and the reason why the replicaset could not be scaled up
While this might not be always true but a likely reason could be the unavailability of resources. Try increasing the resources (cpu+memory) allocated to the cluster.
This was exactly the error I got and increasing allocated resources fixed the issue (on GKE).
I got a similar error like yours yesterday and finally figured out that I could get error message from the pod corresponds with the deployment by using command kubectl get pod YOUR_POD_NAME -o yaml. You can check the status and error message there.