Pods not replaced if MaxUnavailable set to 0 in Kubernetes - deployment

I want rollback deployment for my pods. I'm updating my pod using set Image in a CI environment. When I set maxUnavailable on Deployment/web file to 1, I get downtime. but when I set maxUnavailable to 0, The pods doesnot get replaced and container / app is not restarted.
Also I Have a single Node in Kubernetes cluster and Here's its info
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
881m (93%) 396m (42%) 909712Ki (33%) 1524112Ki (56%)
Events: <none>
Here's the complete YAML file. I do have readiness Probe set.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "10"
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
convert
kompose.version: 1.14.0 (fa706f2)
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{"kompose.cmd":"C:\\ProgramData\\chocolatey\\lib\\kubernetes-kompose\\tools\\kompose.exe convert","kompose.version":"1.14.0 (fa706f2)"},"creationTimestamp":null,"labels":{"io.kompose.service":"dev-web"},"name":"dev-web","namespace":"default"},"spec":{"replicas":1,"strategy":{},"template":{"metadata":{"labels":{"io.kompose.service":"dev-web"}},"spec":{"containers":[{"env":[{"name":"JWT_KEY","value":"ABCD"},{"name":"PORT","value":"2000"},{"name":"GOOGLE_APPLICATION_CREDENTIALS","value":"serviceaccount/quick-pay.json"},{"name":"mongoCon","value":"mongodb://quickpayadmin:quickpay1234#ds121343.mlab.com:21343/quick-pay-db"},{"name":"PGHost","value":"173.255.206.177"},{"name":"PGUser","value":"postgres"},{"name":"PGDatabase","value":"quickpay"},{"name":"PGPassword","value":"z33shan"},{"name":"PGPort","value":"5432"}],"image":"gcr.io/quick-pay-208307/quickpay-dev-node:latest","imagePullPolicy":"Always","name":"dev-web-container","ports":[{"containerPort":2000}],"readinessProbe":{"failureThreshold":3,"httpGet":{"path":"/","port":2000,"scheme":"HTTP"},"initialDelaySeconds":5,"periodSeconds":5,"successThreshold":1,"timeoutSeconds":1},"resources":{"requests":{"cpu":"20m"}}}]}}}}
creationTimestamp: 2018-12-24T12:13:48Z
generation: 12
labels:
io.kompose.service: dev-web
name: dev-web
namespace: default
resourceVersion: "9631122"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/web
uid: 5e66f7b3-0775-11e9-9653-42010a80019d
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
io.kompose.service: web
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: web
spec:
containers:
- env:
- name: PORT
value: "2000"
image: gcr.io/myimagepath/web-node
imagePullPolicy: Always
name: web-container
ports:
- containerPort: 2000
protocol: TCP
readinessProbe:
failureThreshold: 10
httpGet:
path: /
port: 2000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 10m
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 2
conditions:
- lastTransitionTime: 2019-01-03T05:49:46Z
lastUpdateTime: 2019-01-03T05:49:46Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2018-12-24T12:13:48Z
lastUpdateTime: 2019-01-03T06:04:24Z
message: ReplicaSet "dev-web-7bd498fc74" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 12
readyReplicas: 2
replicas: 2
updatedReplicas: 2
I've tried with 1 replica and it still doesnot work.

In first scenario, Kubernetes deletes one pod (maxUnavailable: 1) and started the pod with new image and waits for ~110 seconds(based on your readiness probe) to check if new pod is able to serve request. New pod isn't able to serve requests but the pod is in running state and hence it delete the second old pod and started it with new image and again second pod waits for the readiness probe to complete. This is the reason there is some time in between where both the containers are not ready to serve request and hence the downtime.
In second scenario, where you have maxUnavailable:0, Kubernetes first brings up the pod with new image and it isn't able to serve the request in ~110 seconds(based on your readiness probe) and hence it times out and deletes the new pod with new image. Same happens with the second pod. Hence both your pod do not get updated
So the reason is that you are not giving enough time to your application to come up and start serving requests. You can increase the value of failureThreshold in your readiness probe and maxUnavailable: 0, it will work.

Related

Why is GKE HPA not scaling down?

I have a Kubernetes deployment with a Go App in Kubernetes 1.17 on GKE. It has cpu and memory requests and limits. It has 1 replica specified in the deployment.
Furthermore I have this HPA (I have a autoscaling/v2beta2 defined in my Helm chart, but GKE converts it to a v2beta1 apparently):
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
annotations:
meta.helm.sh/release-name: servicename
meta.helm.sh/release-namespace: namespace
creationTimestamp: "2021-02-15T11:30:18Z"
labels:
app.kubernetes.io/managed-by: Helm
name: servicename-service
namespace: namespace
resourceVersion: "123"
selfLink: link
uid: uid
spec:
maxReplicas: 10
metrics:
- resource:
name: memory
targetAverageUtilization: 80
type: Resource
- resource:
name: cpu
targetAverageUtilization: 80
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: servicename-service
status:
conditions:
- lastTransitionTime: "2021-02-15T11:30:33Z"
message: recommended size matches current size
reason: ReadyForNewScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2021-02-15T13:17:20Z"
message: the HPA was able to successfully calculate a replica count from cpu resource
utilization (percentage of request)
reason: ValidMetricFound
status: "True"
type: ScalingActive
- lastTransitionTime: "2021-02-15T13:17:36Z"
message: the desired count is within the acceptable range
reason: DesiredWithinRange
status: "False"
type: ScalingLimited
currentMetrics:
- resource:
currentAverageUtilization: 14
currentAverageValue: "9396224"
name: memory
type: Resource
- resource:
currentAverageUtilization: 33
currentAverageValue: 84m
name: cpu
type: Resource
currentReplicas: 3
desiredReplicas: 3
lastScaleTime: "2021-02-15T13:40:11Z"
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "456"
meta.helm.sh/release-name: servicename-service
meta.helm.sh/release-namespace: services
creationTimestamp: "2021-02-11T10:00:45Z"
generation: 129
labels:
app: servicename
app.kubernetes.io/managed-by: Helm
chart: servicename
heritage: Helm
release: servicename-service
name: servicename-service-servicename
namespace: namespace
resourceVersion: "123"
selfLink: /apis/apps/v1/namespaces/namespace/deployments/servicename-service-servicename
uid: b1fcc8c6-f3e6-4bbf-92a1-d7ae1e2bb188
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: servicename
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: servicename
release: servicename-service
spec:
containers:
envFrom:
- configMapRef:
name: servicename-service-servicename
image: image
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health/liveness
port: 8888
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: servicename
ports:
- containerPort: 8888
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health/readiness
port: 8888
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 150m
memory: 64Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 3
conditions:
- lastTransitionTime: "2021-02-11T10:00:45Z"
lastUpdateTime: "2021-02-16T14:10:29Z"
message: ReplicaSet "servicename-service-servicename-5b6445fcb" has
successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2021-02-20T16:19:51Z"
lastUpdateTime: "2021-02-20T16:19:51Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 129
readyReplicas: 3
replicas: 3
updatedReplicas: 3
Output of kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
namespace servicename-service Deployment/servicename-service 9%/80%, 1%/80% 1 10 2 6d
namespace xyz-service Deployment/xyz-service 18%/80%, 1%/80% 1 10 1 6d
I haven't changed any Kubernetes Controller default settings like --horizontal-pod-autoscaler-downscale-stabilization.
Question:
Why is it not scaling down to 1 replica when the currentAverageUtilization of the cpu is 33 and the target one 80? I waited for more than 1 hour.
Any ideas?

Kubernetes : How to attain 0 downtime

I have 2 pods running with each CPU : 0.2 Core and Mi : 1 Gi
My node has limit of 0.4 Core and 2 Gi. I can't increase the node limits.
For Zero downtime I have done following config -
apiVersion: apps/v1
kind: Deployment
metadata:
name: abc-deployment
spec:
selector:
matchLabels:
app: abc
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
template:
metadata:
labels:
app: abc
collect_logs_with_filebeat: "true"
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: "false"
spec:
containers:
- name: abc
image: abc-repository:latest
ports:
- containerPort: 8087
readinessProbe:
httpGet:
path: /healthcheck
port: 8087
initialDelaySeconds: 540
timeoutSeconds: 10
periodSeconds: 10
failureThreshold: 20
successThreshold: 1
imagePullPolicy: Always
resources:
limits:
cpu: 0.2
memory: 1000Mi
requests:
cpu: 0.2
memory: 1000Mi
On a new build deployment, two new pod gets created on a new node(because node1 doestn't have enough
memory and cpu to accomodate new pods) say node2. once new container is in running state these newly created pod of node2. the old pods(running on node1)
get desroyed and now node1 have some free space and memory.
Now the issue which i am facing is that, Since node1 have free memory and cpu, Kubernetes is destroying the newly created pods(running on node2)
and after that create pods on node1 and starts app container on that, which is causing downtime.
So, Basically in my case even after using rollingupdate strategy and healthcheck point, I am not able to achieve zero downtime.
Please help here!
You could look at the concept of Pod Disruption Budget that is used mostly for achieving zero downtime for an application.
You could also read a related answer of mine which shows an example of how to achieve the zero down time for an application using the PDBs.

Kubectl rollout restart for statefulset

As per the kubectl docs, kubectl rollout restart is applicable for deployments, daemonsets and statefulsets. It works as expected for deployments. But for statefulsets, it restarts only one pod of the 2 pods.
✗ k rollout restart statefulset alertmanager-main (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted
✗ k rollout status statefulset alertmanager-main (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...
✗ kgp -l app=alertmanager (playground-fdp/monitoring)
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
As you can see the pod alertmanager-main-1 has been restarted and its age is 20s. Whereas the other pod in the statefulset alertmanager, i.e., pod alertmanager-main-0 has not been restarted and it is age is 21h. Any idea how we can restart a statefulset after some configmap used by it has been updated?
[Update 1] Here is the statefulset configuration. As you can see the .spec.updateStrategy.rollingUpdate.partition is not set.
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
creationTimestamp: "2019-12-02T07:17:49Z"
generation: 4
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
ownerReferences:
- apiVersion: monitoring.coreos.com/v1
blockOwnerDeletion: true
controller: true
kind: Alertmanager
name: main
uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
resourceVersion: "521307"
selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
alertmanager: main
app: alertmanager
serviceName: alertmanager-operated
template:
metadata:
creationTimestamp: null
labels:
alertmanager: main
app: alertmanager
spec:
containers:
- args:
- --config.file=/etc/alertmanager/config/alertmanager.yaml
- --cluster.listen-address=[$(POD_IP)]:9094
- --storage.path=/alertmanager
- --data.retention=120h
- --web.listen-address=:9093
- --web.external-url=http://10.47.0.234
- --web.route-prefix=/
- --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
- --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: 10.47.2.76:80/alm/alertmanager:v0.19.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /-/healthy
port: web
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: alertmanager
ports:
- containerPort: 9093
name: web
protocol: TCP
- containerPort: 9094
name: mesh-tcp
protocol: TCP
- containerPort: 9094
name: mesh-udp
protocol: UDP
readinessProbe:
failureThreshold: 10
httpGet:
path: /-/ready
port: web
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
requests:
memory: 200Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
- mountPath: /alertmanager
name: alertmanager-main-db
- args:
- -webhook-url=http://localhost:9093/-/reload
- -volume-dir=/etc/alertmanager/config
image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
imagePullPolicy: IfNotPresent
name: config-reloader
resources:
limits:
cpu: 100m
memory: 25Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
readOnly: true
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: alertmanager-main
serviceAccountName: alertmanager-main
terminationGracePeriodSeconds: 120
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-main
- emptyDir: {}
name: alertmanager-main-db
updateStrategy:
type: RollingUpdate
status:
collisionCount: 0
currentReplicas: 2
currentRevision: alertmanager-main-59d7ccf598
observedGeneration: 4
readyReplicas: 2
replicas: 2
updateRevision: alertmanager-main-59d7ccf598
updatedReplicas: 2
You did not provide whole scenario. It might depends on Readiness Probe or Update Strategy.
StatefulSet restart pods from index 0 to n-1. Details can be found here.
Reason 1*
Statefulset have 4 update strategies.
On Delete
Rolling Updates
Partitions
Forced Rollback
In Partition update you can find information that:
If a partition is specified, all Pods with an ordinal that is greater
than or equal to the partition will be updated when the StatefulSet’s
.spec.template is updated. All Pods with an ordinal that is less
than the partition will not be updated, and, even if they are deleted,
they will be recreated at the previous version. If a StatefulSet’s
.spec.updateStrategy.rollingUpdate.partition is greater than its
.spec.replicas, updates to its .spec.template will not be
propagated to its Pods. In most cases you will not need to use a
partition, but they are useful if you want to stage an update, roll
out a canary, or perform a phased roll out.
So if somewhere in StatefulSet you have set updateStrategy.rollingUpdate.partition: 1 it will restart all pods with index 1 or higher.
Example of partition: 3
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 30m
web-1 1/1 Running 0 30m
web-2 1/1 Running 0 31m
web-3 1/1 Running 0 2m45s
web-4 1/1 Running 0 3m
web-5 1/1 Running 0 3m13s
Reason 2
Configuration of Readiness probe.
If your values of initialDelaySeconds and periodSeconds are high, it might take a while before another one will be restarted. Details about those parameters can be found here.
In below example, pod will wait 10 seconds it will be running, and readiness probe is checking this each 2 seconds. Depends on values it might be cause of this behavior.
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
Reason 3
I saw that you have 2 containers in each pod.
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
As describe in docs:
Running - The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting.
It would be good to check if everything is ok with both containers (readinessProbe/livenessProbe, restarts etc.)
You would need to delete it. Stateful set are removed following their ordinal index with the highest ordinal index first.
Also you do not need to restart pod to re-read updated config map. This is happening automatically (after some period of time).
This might be related to your ownerReferences definition. You can try it without any owner and do the rollout again.

How to find non-portable fields from existing kubernetes resources configuration?

Cluster information:
Kubernetes version: v1.12.8-gke.10 on GCP
Question:
I’m doing application migration now. The thing I do is to grab all configurations of related resources and then deploy them to a new cluster. After getting information from shell command kubectl get <resource> -o yaml, I noticed that there is a lot of information that my deploy YAMLs don’t have.
I deleted .spec.clusterIP, .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.creationTimestamp, .metadata.generation, .status, .spec.template.spec.securityContext, .spec.template.spec.dnsPolicy, .spec.template.spec.terminationGracePeriodSeconds, .spec.template.spec.restartPolicy fields.
I’m not sure is there other fields that will influence the new deployment I need to delete?
Is there a way to find all non-portable fields that I can delete?
And another question is: do all related resources matter? For now I just grab a list of resources from kubectl api-resources and then get info of them one by one. Should I ignore some resources like ReplicaSet to migrate the whole application?
For example, output configuration of nginx deployment will be like:
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2019-07-16T21:55:39Z"
generation: 1
labels:
app: nginx
name: nginx-deployment
namespace: default
resourceVersion: "1482081"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/nginx-deployment
uid: 732377ee-a814-11e9-bbe9-42010a8a001a
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx:1.7.9
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 2
conditions:
- lastTransitionTime: "2019-07-16T21:55:41Z"
lastUpdateTime: "2019-07-16T21:55:41Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2019-07-16T21:55:39Z"
lastUpdateTime: "2019-07-16T21:55:41Z"
message: ReplicaSet "nginx-deployment-5c689d88bb" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 2
replicas: 2
updatedReplicas: 2```
Right off the bat, there is no way to detect which fields are cluster-specific automatically, the kubectl get [resource] -o yaml is outputting the current RESTful state of the resource. However, you can use some linux bash to manipulate the ouput of a cluster dump to get the fields you want. Take a look at this blog post on medium.
As to the "do all resources matter" the answer is no. If you have a deployment, you don't need the replicaSet or the pod resources since the deployment will manage and create those once it is deployed. You just need the top level controller resource (same thing does for daemonsets and statefulsets).
On another note, the fields from the spec section can mostly all be kept, the values that you are removing are likely default values you never set initially but there is no real benefit in removing them.

Kubernetes updating only 1 pod instead of all ( 2 replicas) on Rolling Update

I've set up 2 replicas of a deployment.
when I use
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
It only updates 1 pod when I update it via set Image. The second pod doesnot get updated with new code. This means 1 have 2 pods running different images.
When I set maxSurge 25% and maxUnavailable 25%, the pods don't get replaced at all.
Here's the complete yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "89"
creationTimestamp: 2018-11-26T09:40:48Z
generation: 94
labels:
io.kompose.service: servicing
name: servicing
namespace: default
resourceVersion: "6858872"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/servicing
uid: 5adb98c8-f15f-11e8-8752-42010a800188
spec:
replicas: 2
selector:
matchLabels:
io.kompose.service: servicing
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: servicing
spec:
containers:
- env:
- name: JWT_KEY
value: ABCD
- name: PORT
value: "3001"
image: gcr.io/something/something
imagePullPolicy: Always
name: servicing-container
ports:
- containerPort: 3001
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 3001
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 25m
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 2
conditions:
- lastTransitionTime: 2018-12-13T11:55:00Z
lastUpdateTime: 2018-12-13T11:55:00Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 94
readyReplicas: 2
replicas: 2
updatedReplicas: 2
You have setup initialDelaySeconds to 5, periodSeconds to 5 and failureThreshold to 3, it means that kubernetes will wait initial 5 seconds to do first probe to your application is ready or not and then periodically probe your app every 5 seconds if it is ready or not and will do it 3 times. So your application will be checked at 10 second, 15 second and 20 second and if pod doesn't come up in that time, it bails out without upgrading.
You might need to increase this failureThreshold so that your app does have enough time to come up.
Also, I would suggest you to make maxUnavailable to 0 so that pod will be deleted only when the new pod comes up in its replacement.
Check my answer here for better understanding:
Kubernetes 0 Downtime using Readiness Probe and RollBack strategy not working