K8S rolling update close pods after Readiness probe is healthy - kubernetes

Here is my Readiness Probe Configuration:
readinessProbe:
httpGet:
path: /devops/versioninfo/api
port: 9001
initialDelaySeconds: 300
timeoutSeconds: 3
periodSeconds: 10
failureThreshold: 60
Here is my rolling update strategy:
strategy:
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
Because it will take a long time for my pods to be ready, but when the deployment is rolling update, old pods will be deleted when the new one's status is running whose ready health is not ok.
How to let the rolling update strategy be that the new one is ready and then delete the old one.

You can try increasing the minReadySeconds option in the Deployment spec. Basically, tell the deployment that you need to at least wait X number of seconds before you can say one particular pod is ready.
✌️

Related

kubectl rollout status - When the command complete?

Currently I am using this in my pipeline
kubectl apply -f deployment.yaml && kubectl rollout status -f deployment.yaml
With this in yaml
readinessProbe:
tcpSocket:
port: 90
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 10
failureThreshold: 1
successThreshold: 1
livenessProbe:
tcpSocket:
port: 90
initialDelaySeconds: 120
periodSeconds: 20
timeoutSeconds: 2
failureThreshold: 1
successThreshold: 1
For me, kubectl rollout is running for a very long time, blocking the deployment pipeline. From the documentation
By default 'rollout status' will watch the status of the latest rollout until it's done
My question:
1/ Which actions are the parts that contribute to the deployment "until it's done" (resource creation, resource teardown?... )
2/ Does readinessProbe and livenessProbe contribute to the deployment time
The criteria for this are in the kubectl source. A deployment is "complete" if:
It hasn't timed out
Its updated-replica count is at least its desired-replica count (every new pod has been created)
Its current-replica count is at most its updated-replica count (every old pod has been destroyed)
Its available-replica count is at least its updated-replica count (every new pod is running)
You can use kubectl get deployment -w or kubectl get pod -w to watch a deployment actually happen in real time; the kubectl get -w option watches the given resources and prints out a new line whenever they change. You'll see the following sequence occur (with default Deployment settings, one at a time for "small" deployments):
A new pod is created
The new pod passes its probes and become ready
An old pod is terminated
The old pod actually exits and is deleted
So for kubectl rollout status deployment/... to finish, all of these steps must happen – new pods are created, new pods all pass their health checks, old pods are destroyed – for every replica in the deployment.

K8S Pod with startupProbe and initialDelaySeconds specified waits too long to become Ready

I have been trying to debug a very odd delay in my K8S deployments. I have tracked it down to the simple reproduction below. What it appears is that if I set an initialDelaySeconds on a startup probe or leave it 0 and have a single failure, then the probe doesn't get run again for a while and ends up with atleast a 1-1.5 minute delay getting into Ready:true state.
I am running locally with Ubutunu 18.04 and microk8s v1.19.3 with the following versions:
kubelet: v1.19.3-34+a56971609ff35a
kube-proxy: v1.19.3-34+a56971609ff35a
containerd://1.3.7
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: microbot
name: microbot
spec:
replicas: 1
selector:
matchLabels:
app: microbot
strategy: {}
template:
metadata:
labels:
app: microbot
spec:
containers:
- image: cdkbot/microbot-amd64
name: microbot
command: ["/bin/sh"]
args: ["-c", "sleep 3; /start_nginx.sh"]
#args: ["-c", "/start_nginx.sh"]
ports:
- containerPort: 80
startupProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 0 # 5 also has same issue
periodSeconds: 1
failureThreshold: 10
successThreshold: 1
##livenessProbe:
## httpGet:
## path: /
## port: 80
## initialDelaySeconds: 0
## periodSeconds: 10
## failureThreshold: 1
resources: {}
restartPolicy: Always
serviceAccountName: ""
status: {}
---
apiVersion: v1
kind: Service
metadata:
name: microbot
labels:
app: microbot
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: microbot
The issue is that if I have any delay in the startupProbe or if there is an initial failure, the pod gets into Initialized:true state but had Ready:False and ContainersReady:False. It will not change from this state for 1-1.5 minutes. I haven't found a pattern to the settings.
I left in the comment out settings as well so you can see what I am trying to get to here. What I have is a container starting up that has a service that will take a few seconds to get started. I want to tell the startupProbe to wait a little bit and then check every second to see if we are ready to go. The configuration seems to work, but there is a baked in delay that I can't track down. Even after the startup probe is passing, it does not transition the pod to Ready for more than a minute.
Is there some setting elsewhere in k8s that is delaying the amount of time before a Pod can move into Ready if it isn't Ready initially?
Any ideas are greatly appreciated.
Actually I made a mistake in comments, you can use initialDelaySeconds in startupProbe, but you should rather use failureThreshold and periodSeconds instead.
As mentioned here
Kubernetes Probes
Kubernetes supports readiness and liveness probes for versions ≤ 1.15. Startup probes were added in 1.16 as an alpha feature and graduated to beta in 1.18 (WARNING: 1.16 deprecated several Kubernetes APIs. Use this migration guide to check for compatibility).
All the probe have the following parameters:
initialDelaySeconds : number of seconds to wait before initiating
liveness or readiness probes
periodSeconds: how often to check the probe
timeoutSeconds: number of seconds before marking the probe as timing
out (failing the health check)
successThreshold : minimum number of consecutive successful checks
for the probe to pass
failureThreshold : number of retries before marking the probe as
failed. For liveness probes, this will lead to the pod restarting.
For readiness probes, this will mark the pod as unready.
So why should you use failureThreshold and periodSeconds?
consider an application where it occasionally needs to download large amounts of data or do an expensive operation at the start of the process. Since initialDelaySeconds is a static number, we are forced to always take the worst-case scenario (or extend the failureThreshold that may affect long-running behavior) and wait for a long time even when that application does not need to carry out long-running initialization steps. With startup probes, we can instead configure failureThreshold and periodSeconds to model this uncertainty better. For example, setting failureThreshold to 15 and periodSeconds to 5 means the application will get 15 (fifteen) x 5 (five) = 75s to startup before it fails.
Additionally if you need more informations take a look at this article on medium.
Quoted from kubernetes documentation about Protect slow starting containers with startup probes
Sometimes, you have to deal with legacy applications that might require an additional startup time on their first initialization. In such cases, it can be tricky to set up liveness probe parameters without compromising the fast response to deadlocks that motivated such a probe. The trick is to set up a startup probe with the same command, HTTP or TCP check, with a failureThreshold * periodSeconds long enough to cover the worse case startup time.
So, the previous example would become:
ports:
- name: liveness-port
containerPort: 8080
hostPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 1
periodSeconds: 10
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
Thanks to the startup probe, the application will have a maximum of 5 minutes (30 * 10 = 300s) to finish its startup. Once the startup probe has succeeded once, the liveness probe takes over to provide a fast response to container deadlocks. If the startup probe never succeeds, the container is killed after 300s and subject to the pod's restartPolicy.

kubernetes pod restart takes time and downtime

I have a service and pod in node.js . .consider hello world ..
exposed port : 80 on http
I want to seamlessly restart my service/pod
pod/service restart is taking a lot of time, thus there is downtime.
Using : kubectl delete; then recreate it with kubectl.
How can i avoid delay and downtime ?
Considering continuous deployments, your previous Pods will be terminated & new Pods will be created. Therefore, downtime of service is possible.
To avoid this add strategy in your deployment spec
example:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: api
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
where maxUnavailable: 0 defines that at any given time more than 1 pods should be available
Extra:
If you service takes some time to be live you can use readiness probe in spec to avoid traffic to be routed before the pods are ready .
example :
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 15
periodSeconds: 30

What does the following deployment strategy imply in kubernetes?

I'm using default deployment strategy for my load-balancer service in kubernetes, and when I describe my deployment the strategy looks like follows:
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
So according to the description, there should not be any downtime. However, there's still a downtime in the service.
How can I make sure there's zero downtime?
From what I can see you are using the as fast as possible approach of Rolling Updates.
While this is a good approach it's better to use Replicas: 3, because you might end up with 2 pods down during update.
You should implement ReadinessProbe that might look like the following:
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
initialDelaySeconds: Number of seconds after the container has started before readiness probes are initiated.
periodSeconds: How often to perform the probe. Default to 10 seconds.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1.
I also recommend reading Enable Rolling updates in Kubernetes with Zero downtime, as they nicely explains the use of Rolling updates.

How does minReadySeconds affect readiness probe?

Let's say I have a deployment template like this
spec:
minReadySeconds: 15
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 5
How will this affect the newly versions of my app? Will the minReadySeconds and initialDelaySeconds count at the same time? Will the initialDelaySeconds come first then minReadySeconds?
From Kubernetes Deployment documentation:
.spec.minReadySeconds is an optional field that specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing, for it to be considered available. This defaults to 0 (the Pod will be considered available as soon as it is ready). To learn more about when a Pod is considered ready, see Container Probes
So your newly created app pod have to be ready for .spec.minReadySeconds seconds to be considered as available.
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated.
So initialDelaySeconds comes before minReadySeconds.
Lets say, container in the pod has started at t seconds. Readiness probe will be initiated at t+initialDelaySeconds seconds. Assume Pod become ready at t1 seconds(t1 > t+initialDelaySeconds). So this pod will be available after t1+minReadySeconds seconds.