Kubernetes Rolling Update not obeying 'maxUnavailable' replicas when redeployed in autoscaled conditions - deployment

In a nutshell, most of our apps are configured with the following strategy in the Deployment -
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
The Horizonatal Pod Autoscaler is configured as so
spec:
maxReplicas: 10
minReplicas: 2
Now when our application was redeployed, instead of running a rolling update, it instantly terminated 8 of our pods and dropped the number of pods to 2 which is the min number of replicas available. This happened in a fraction of a second as you can see here.
Here is the output of kubectl get hpa -
As maxUnavailable is 25%, shouldn't only about 2-3 pods go down at max ? Why did so many pods crash at once ? It seems as though rolling update is useless if it works this way.
What am I missing ?

After looking at this question, I decided to try this with test Environment where I wanted to check If it doesn't work.
I have setup the metrics-server to fetch the metrics server and set a HPA. I have followed the following steps to setup the HPA and deployment:
How to Enable KubeAPI server for HPA Autoscaling Metrics
Once, I have working HPA and max 10 pods running on system, I have updated the images using:
[root#ip-10-0-1-176 ~]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 49%/50% 1 10 10 87m
[root#ip-10-0-1-176 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
load-generator-557649ddcd-6jlnl 1/1 Running 0 61m
php-apache-75bf8f859d-22xvv 1/1 Running 0 91s
php-apache-75bf8f859d-dv5xg 1/1 Running 0 106s
php-apache-75bf8f859d-g4zgb 1/1 Running 0 106s
php-apache-75bf8f859d-hv2xk 1/1 Running 0 2m16s
php-apache-75bf8f859d-jkctt 1/1 Running 0 2m46s
php-apache-75bf8f859d-nlrzs 1/1 Running 0 2m46s
php-apache-75bf8f859d-ptg5k 1/1 Running 0 106s
php-apache-75bf8f859d-sbctw 1/1 Running 0 91s
php-apache-75bf8f859d-tkjhb 1/1 Running 0 55m
php-apache-75bf8f859d-wv5nc 1/1 Running 0 106s
[root#ip-10-0-1-176 ~]# kubectl set image deployment php-apache php-apache=hpa-example:v1 --record
deployment.extensions/php-apache image updated
[root#ip-10-0-1-176 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
load-generator-557649ddcd-6jlnl 1/1 Running 0 62m
php-apache-75bf8f859d-dv5xg 1/1 Terminating 0 2m40s
php-apache-75bf8f859d-g4zgb 1/1 Terminating 0 2m40s
php-apache-75bf8f859d-hv2xk 1/1 Terminating 0 3m10s
php-apache-75bf8f859d-jkctt 1/1 Running 0 3m40s
php-apache-75bf8f859d-nlrzs 1/1 Running 0 3m40s
php-apache-75bf8f859d-ptg5k 1/1 Terminating 0 2m40s
php-apache-75bf8f859d-sbctw 0/1 Terminating 0 2m25s
php-apache-75bf8f859d-tkjhb 1/1 Running 0 56m
php-apache-75bf8f859d-wv5nc 1/1 Terminating 0 2m40s
php-apache-847c8ff9f4-7cbds 1/1 Running 0 6s
php-apache-847c8ff9f4-7vh69 1/1 Running 0 6s
php-apache-847c8ff9f4-9hdz4 1/1 Running 0 6s
php-apache-847c8ff9f4-dlltb 0/1 ContainerCreating 0 3s
php-apache-847c8ff9f4-nwcn6 1/1 Running 0 6s
php-apache-847c8ff9f4-p8c54 1/1 Running 0 6s
php-apache-847c8ff9f4-pg8h8 0/1 Pending 0 3s
php-apache-847c8ff9f4-pqzjw 0/1 Pending 0 2s
php-apache-847c8ff9f4-q8j4d 0/1 ContainerCreating 0 4s
php-apache-847c8ff9f4-xpbzl 0/1 Pending 0 1s
Also, I have kept job in background which pushed the kubectl get pods output every second in a file. At no time till all images are upgraded, number of pods never went below 8.
I believe you need to check how you're setting up your rolling upgrade. Are you using deployment or replicaset? I have kept the rolling update strategy same as you maxUnavailable: 25% and maxSurge: 25% with deployment and it is working well for me.

I want to point out the minReadySeconds property.
The minReadySeconds property that specifies how long a newly created pod should be ready before the pod is treated as available. Actually the redeploying that without minReadySeconds's property has been done successfully in a very short time. But after short time readiness probe started to failing for any reason and the pods start scale down.
maxUnavailable property is only taken care about while RollingUpdate. After RollingUpdate event this property ignored.
Note from Kubernetes In Action's book : If you only define the readiness probe without setting minReadySeconds properly, new pods are considered available immediately when the first invocation of the readiness probe succeeds. If the readiness probe starts failing shortly after, the bad version is rolled out across all pods. Therefore, you should set minReadySeconds appropriately.

In our case we added the replicas field a while ago and forgot to remove it when we added the HPA. The HPA does not play nice with the replicas field during deployments, so if you have a HPA remove the replicas field. See https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#migrating-deployments-and-statefulsets-to-horizontal-autoscaling
When an HPA is enabled, it is recommended that the value of spec.replicas of the Deployment and / or StatefulSet be removed from their manifest(s). If this isn't done, any time a change to that object is applied, for example via kubectl apply -f deployment.yaml, this will instruct Kubernetes to scale the current number of Pods to the value of the spec.replicas key. This may not be desired and could be troublesome when an HPA is active.
Keep in mind that the removal of spec.replicas may incur a one-time degradation of Pod counts as the default value of this key is 1 (reference Deployment Replicas). Upon the update, all Pods except 1 will begin their termination procedures. Any deployment application afterwards will behave as normal and respect a rolling update configuration as desired.

Related

Debug a pod stuck in pending state [duplicate]

This question already has an answer here:
Error "pod has unbound immediate PersistentVolumeClaim" during statefulset deployment
(1 answer)
Closed 2 years ago.
How could I debug a pod stuck in pending state? I am using k8ssandra https://k8ssandra.io/docs/ to create a Cassandra cluster. It uses helm files. I created a 3 nodes cluster and changed size value to 3 in local values.yaml file to create a 3 node cluster - https://github.com/k8ssandra/k8ssandra/blob/main/charts/k8ssandra-cluster/values.yaml
no_reply#cloudshell:~ (k8ssandra-299315)$ kubectl get pods
NAME READY STATUS RESTARTS AGE
cass-operator-86d4dc45cd-588c8 1/1 Running 0 29h
grafana-deployment-66557855cc-j7476 1/1 Running 0 29h
k8ssandra-cluster-a-grafana-operator-k8ssandra-5b89b64f4f-8pbxk 1/1 Running 0 29h
k8ssandra-cluster-a-reaper-k8ssandra-847c99ccd8-dsnj4 1/1 Running 0 28h
k8ssandra-cluster-a-reaper-k8ssandra-schema-5fzpn 0/1 Completed 0 28h
k8ssandra-cluster-a-reaper-operator-k8ssandra-87d56d56f-wn8hw 1/1 Running 0 29h
k8ssandra-dc1-default-sts-0 2/2 Running 0 29h
**k8ssandra-dc1-default-sts-1 0/2 Pending 0 14m**
k8ssandra-dc1-default-sts-2 2/2 Running 0 14m
k8ssandra-tools-kube-prome-operator-6bcdf668d4-ndhw9 1/1 Running 0 29h
prometheus-k8ssandra-cluster-a-prometheus-k8ssandra-0 2/2 Running 1 29h
The best way as described by Arghya is checking the events of the pod.
kubectl describe pod k8ssandra-dc1-default-sts-1
You could also check for the logs of the pod:
kubectl logs k8ssandra-dc1-default-sts-1

What is 'AVAILABLE' column in kubernetes daemonsets

I may have a stupid question but could someone explain what "Available" correctly represent in DaemonSets? I checked What is the difference between current and available pod replicas in kubernetes deployment? answer but there are no readiness errors.
In cluster i see below status:
$ kubectl get ds -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR
kube-proxy 6 6 5 6 5 beta.kubernetes.io/os=linux
Why it is showing as 5 instead of 6?
all pods are running perfectly fine without any "readiness" errors or restarts?
$ kubectl get pods -n kube-system | grep kube-proxy
kube-proxy-cv7vv 1/1 Running 0 20d
kube-proxy-kcd67 1/1 Running 0 20d
kube-proxy-l4nfk 1/1 Running 0 20d
kube-proxy-mkvjd 1/1 Running 0 87d
kube-proxy-qb7nz 1/1 Running 0 36d
kube-proxy-x8l87 1/1 Running 0 87d
Could someone tell what can be checked further?
The Available field shows the number of replicas or pods that are ready to accept traffic and passed all the criterion such as readiness or liveness probe or any other condition that verifies that your application is ready to serve the requests coming from user.

promethues operator alertmanager-main-0 pending and display

What happened?
kubernetes version: 1.12
promethus operator: release-0.1
I follow the README:
$ kubectl create -f manifests/
# It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding.
$ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
$ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).
and then I use the command and then is showed like:
[root#VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 66s
alertmanager-main-1 1/2 Running 0 47s
grafana-54f84fdf45-kt2j9 1/1 Running 0 72s
kube-state-metrics-65b8dbf498-h7d8g 4/4 Running 0 57s
node-exporter-7mpjw 2/2 Running 0 72s
node-exporter-crfgv 2/2 Running 0 72s
node-exporter-l7s9g 2/2 Running 0 72s
node-exporter-lqpns 2/2 Running 0 72s
prometheus-adapter-5b6f856dbc-ndfwl 1/1 Running 0 72s
prometheus-k8s-0 3/3 Running 1 59s
prometheus-k8s-1 3/3 Running 1 59s
prometheus-operator-5c64c8969-lqvkb 1/1 Running 0 72s
[root#VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 0/2 Pending 0 0s
grafana-54f84fdf45-kt2j9 1/1 Running 0 75s
kube-state-metrics-65b8dbf498-h7d8g 4/4 Running 0 60s
node-exporter-7mpjw 2/2 Running 0 75s
node-exporter-crfgv 2/2 Running 0 75s
node-exporter-l7s9g 2/2 Running 0 75s
node-exporter-lqpns 2/2 Running 0 75s
prometheus-adapter-5b6f856dbc-ndfwl 1/1 Running 0 75s
prometheus-k8s-0 3/3 Running 1 62s
prometheus-k8s-1 3/3 Running 1 62s
prometheus-operator-5c64c8969-lqvkb 1/1 Running 0 75s
I don't know why the pod altertmanager-main-0 pending and disaply then restart.
And I see the event, it is showed as:
72s Warning FailedCreate StatefulSet create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s Warning FailedCreate StatefulSet create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s Warning^Z FailedCreate StatefulSet
[10]+ Stopped kubectl get events -n monitoring
Most likely the alertmanager does not get enough time to start correctly.
Have a look at this answer : https://github.com/coreos/prometheus-operator/issues/965#issuecomment-460223268
You can set the paused field to true, and then modify the StatefulSet to try if extending the liveness/readiness solves your issue.

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.
I start my master, by running the following commands:
> kubeadm reset
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later
> kubectl apply -f https://git.io/weave-kube/
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 2m46s
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 2m46s
etcd-kubemaster 1/1 Running 0 93s
kube-apiserver-kubemaster 1/1 Running 0 93s
kube-controller-manager-kubemaster 1/1 Running 0 113s
kube-proxy-lxhvs 1/1 Running 0 2m46s
kube-scheduler-kubemaster 1/1 Running 0 93s
Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :
> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq
I can see in the Events part the following Warning :
Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.
Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :
> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
--discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]
> ssh [USER]#[WORKER_IP] 'bash' < join.sh
This node has joined the cluster.
On the master, I check that the node is connected:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 13m v1.14.1
kubeslave1 NotReady <none> 31s v1.14.1
And I check my pods :
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 14m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 14m
etcd-kubemaster 1/1 Running 0 13m
kube-apiserver-kubemaster 1/1 Running 0 13m
kube-controller-manager-kubemaster 1/1 Running 0 13m
kube-proxy-lxhvs 1/1 Running 0 14m
kube-proxy-xllx4 0/1 ContainerCreating 0 2m16s
kube-scheduler-kubemaster 1/1 Running 0 13m
We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.
And when I am doing a describe again :
kubectl -n kube-system describe pod kube-proxy-xllx4
I can see in the Events part multiple identical Warnings :
Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused
Here are my repositories :
docker image ls
REPOSITORY TAG
k8s.gcr.io/kube-proxy v1.14.1
k8s.gcr.io/kube-apiserver v1.14.1
k8s.gcr.io/kube-controller-manager v1.14.1
k8s.gcr.io/kube-scheduler v1.14.1
k8s.gcr.io/coredns 1.3.1
k8s.gcr.io/etcd 3.3.10
k8s.gcr.io/pause 3.1
And so, for the dashboard part, I tried to start it with the command
> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
But the dashboard pod is stuck in Pending state.
kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 40m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 40m
etcd-kubemaster 1/1 Running 0 38m
kube-apiserver-kubemaster 1/1 Running 0 38m
kube-controller-manager-kubemaster 1/1 Running 0 39m
kube-proxy-lxhvs 1/1 Running 0 40m
kube-proxy-xllx4 0/1 ContainerCreating 0 27m
kube-scheduler-kubemaster 1/1 Running 0 38m
kubernetes-dashboard-5f7b999d65-qn8qn 1/1 Pending 0 8s
So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.
I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.
There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.
Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.
EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:
The network must be deployed before any applications. Also, CoreDNS
will not start up before a network is installed. kubeadm only
supports Container Network Interface (CNI) based networks (and does
not support kubenet).
Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.
In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:
kubectl describe node | grep -i taints
Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.
Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 10
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: http
tolerations:
- effect: NoExecute #NoSchedule, PreferNoSchedule
key: node
operator: Equal
value: not-ready
tolerationSeconds: 3600
As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.
Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:
kubectl taint node NODE key-
Hope it makes sense. Just add this section to your deployment, and it will work.
Set up the flannel network tool.
Running commands:
$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f
https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

Kubernetes - does not start the role of master

I'm starting a Kubernetes cluster of 3 nodes (1 master, 2 worker)
Trying to go by steps described in Ansible playbook - https://gitlab.com/LinarNadyrov/gcp/tree/master
Applying playbook steps 1,2,3 consequentially
After than, I connect to master to check status:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
NAME STATUS ROLES AGE VERSION
master NotReady master 17m v1.13.0
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
enter link description here
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-86c58d9df4-7jc4b 0/1 Pending 0 3h45m
coredns-86c58d9df4-929xf 0/1 Pending 0 3h45m
etcd-officemasterkub 1/1 Running 2 7h26m
kube-apiserver-officemasterkub 1/1 Running 2 7h26m
kube-controller-manager-officemasterkub 1/1 Running 2 7h26m
kube-flannel-ds-5jhbx 0/1 Pending 0 7h20m
kube-flannel-ds-wqfvs 0/1 Pending 0 7h20m
kube-proxy-gmngj 1/1 Running 2 7h27m
kube-proxy-ppbqp 1/1 Running 1 7h20m
kube-proxy-r2rn6 1/1 Running 1 7h20m
kube-scheduler-officemasterkub 1/1 Running 2 7h26m
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Status is NotReady
Could anyone help me with it?
What's the problem? What should be done to fix it? Maybe I missed something?
Thanx in advance!
Линар Надыров the problem here is with your flannel yaml file. You did not specify any resources in the DaemonSet so there are no flannel pods spawning.
I did not check any further as it was enough reason why is this issue occurring. You can use this yaml if this is for testing purposes. Or edit your accordingly to the provided example.
In your file change the line 43 to:
shell: kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml >> pod_network_setup.txt
You can find more about DaemonSets here.